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1  Project  Goals: 


The  Cosynthesis  at  Board  and  MCM  Levels  for  Digital  Signal  Processors  (COMET)  Project  is  a 
RASSP  Technology  Base  Program  at  the  University  of  Cincinnati.  RASSP  (Rapid  Prototyping 
of  Application  Specific  Signal  Processors)  is  an  Advanced  Research  Projects  Agency,  Electronic 
Systems  Technology  Office  (ARPA/ESTO)  program.  The  COMET  project  is  monitored  by  the  US 
Air  Force  Wright  Laboratory  under  contract  number  F33615-93-C-1316. 

The  goal  of  the  COMET  project  is  to  develop  languages,  techniques  and  tools  for  hardware,  soft¬ 
ware  cosynthesis  of  board-  and  MCM-level  Digital  Signal  Processing  (DSP)  systems  from  very  high 
level  requirements  specifications.  A  second  goal  is  to  develop  a  usage  guide  for  the  Level  2  Wave¬ 
form  and  Vector  Exchange  Specification  (WAVES)  language.  The  COMET  project  includes  the 
development  of,  (1)  VSPEC,  a  declarative  interface  requirements  specification  language  for  VHDL 
entities,  (2)  hardware/software  cosynthesis  techniques  for  embedded  DSP  systems,  (3)  hierarchi¬ 
cal  multi-technology  hardware  partitioning  tools,  (4)  software  compilation  techniques  to  compile 
behavioral  VHDL  into  C,  (5)  a  WAVES  Level  2  usage  guide  and  (6)  exploring  WAVES  usage  in 
conjunction  with  BSDL  and  for  hierarchical  testing. 

COMET  project  statement  of  work  is  as  follows: 

1.  Extend  VHDL  to  create  VSPEC  Specification  Language  (Requirement  3.2) 

2.  Develop  technology  driven  VSPEC  partitioner  (Requirement  3.3) 

3.  Develop  VSPEC-Embedded  software  Translator  (Requirement  3.4) 

4.  Integration  and  distribution  (Requirement  3.5) 

5.  WAVES  usage  guide  for  electronic  module  design  development  (Requirement  3.6) 
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2  Accomplishments 


The  accomplishments  of  the  COMET  project  are  summarized  as  follows: 

1.  VSPEC  Development  (CDRL  A007) 

VSPEC  as  developed  under  this  effort  is  a  Larch  interface  language  for  VHDL.  VSPEC 
provides  a  declarative  specification  mechanism  for  defining:  (i)  axiomatic  requirements,  (ii) 
activation  conditions  (iii)  internal  state,  (iv)  constraints,  and  (v)  abstract  architectures  for 
systems.  VSPEC  is  fully  compatible  with  VHDL  and  provides  requirement  definitions  for  the 
interfaces  of  entities,  functions  and  procedures. 

With  the  language  definition  complete  a  formal  semantics  for  VSPEC  was  defined  using  the 
Larch  Shared  Language  (LSL).  This  formal  semantics  is  used  to  precisely  define  what  VSPEC 
means  and  for  verification.  The  VSPEC  parser  is  being  extended  currently  to  generate  LSL 
directly  for  use  in  verification  tools. 

2.  VSPEC  Partitioner  (CDRL  A008) 

Several  partitioning  approaches  were  developed  under  this  project.  Notable  of  these  were  the 
REBOUND  tool  and  the  genetic  partitioner  for  codesigns. 

The  REBOUND  tool  generates  structural  architectures.  Accordingly,  in  the  current  version 
of  the  hardware/software  partitioning  tool,  concurrent  statements  are  limited  to  components. 
The  approach  is,  however,  extensible  to  other  concurrent  statements  such  as  processes  and 
blocks  as  well. 

The  genetic  partitioner  contemplates  hardware  software  codesigns  based  on  a  relaxation- 
based  retiming  strategy.  The  partitioner  explores  a  large  number  of  hardware  alternatives 
and  hardware/software  bindings.  To  aid  this  process,  a  detailed  performance  estimator  for 
pipelined  and  nonpipelined  codesigns  has  been  developed. 

3.  VSPEC-Embedded  Software  Translator 

Two  tools  for  software  synthesis  were  developed  as  a  part  of  this  effort.  The  first  was  a 
stand-alone  parser  developed  around  an  ad  hoc  VHDL  front  end.  This  system  generated 
code  for  the  Texas  Instruments  TMS320  series  DSP  processor.  Example  systems  included: 
(i)  a  compander  system,  (ii)  an  FFT  subsystem,  (iii)  an  IFFT  subsystem,  and  (iv)  an  HR 
filter.  Each  example  was  coded  in  VHDL-S,  synthesized  into  C  and  evaluated  on  a  TMS320 
prototyping  system. 

The  examples  synthesized  generated  the  capability  to  generate  C  for  the  VHDL-S  subset. 
Further,  the  initial  example  set  demonstrated  the  ability  to  generate:  (i)  a  generic  operating 
system  kernel,  and  (ii)  interface  routines  to  support  executing  the  C  code.  VHDL  is  inherently 
parallel  in  nature  while  C  is  inherently  sequential.  Each  VHDL-S  process  is  transformed  into 
a  C  process  by  the  translator.  These  processes  are  managed  by  the  simple  operating  system 
using  message  passing  for  interprocess  communication.  C  routines  are  also  generated  to 
manage  interfaces  between  software  and  associated  hardware  devices.  This  is  primarily  used 
for  I/O  associated  with  the  DSP  processor. 

The  second  software  translation  system  took  the  initial  results  from  the  stand  alone  parser 
and  incorporated  them  into  the  SAVANT  environment.  The  SAVANT  environment  provides 
a  much  richer  and  more  stable  platform  for  building  the  translator.  The  object  model  was 
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extended  to  include  C  publishing  routines  and  additional  enhancements  were  added.  The 
most  significant  addition  was  the  ability  to  generate  generic  C  from  VHDL-S.  The  generic  C 

code  is  standard  C  with  TMS320-specific  additions.  The  this  code  was  tested  on  both  Solaris 
and  Linux  platforms. 

The  final  translator  delivered  here  can  generate  code  for  either  the  TMS320  or  a  generic  C 

system.  It  is  based  on  the  SAVANT  toolset,  but  has  not  been  ported  to  the  most  current 
SAVANT  release. 

4.  Integration  and  Distribution 

All  VSPEC  software  has  been  integrated  and  transferred  to  VHDL  community  by  publica¬ 
tions,  presentations  and  repository  access.  The  software  can  be  accessed  by  anonymous  File 
Transfer  Protocol  (FTP)  by  contacting  the  PI  of  this  project.  Several  publications  resulting 
from  this  project  are  appended  in  this  report. 

5.  WAVES  Usage  Guides  (CDRL  A009) 

A  WAVES  Level-2  usage  guide  has  been  developed.  In  addition,  two  detailed  case  studies 
illustrating  the  use  of  WAVES  Level-2  have  been  developed.  A  document  describing  the  use 
of  WAVES  for  testing  boundary  scan  devices  has  been  developed.  A  final  document  has  been 
written  describing  the  use  of  WAVES  in  conjunction  with  BSDL. 
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APPENDIX  A  : 

Board  and  MCM  Level  Synthesis  for  Embedded  Systems: 
The  COMET  Cosynthesis  Environment  * 

Ranga  Vemuri,  Hal  Carter  and  Perry  Alexander 
Department  of  Electrical  and  Computer  Engineering 
University  of  Cincinnati,  ML.  30 
Cincinnati,  Ohio  45221-0030 
Ph:  513-556-4784;  Email:  ranga.vemuri@uc.edu 


Abstract 

COMET  is  a  cosynthesis  environment  for  application - 
specific  electronic  signal  processing  modules.  COMET  is 
capable  of  automatically  synthesizing  multicomponent 
hardware- software  systems  at  the  board -  and  MCM-  lev¬ 
els .  In  the  COMET  environment,  system-level  specifi¬ 
cations  are  written  in  VSPEC,  a  declarative  annotation 
language  for  VHDL  entities.  COMET  contains  various 
VHDL -centered  tools  for  hardware-software  partition¬ 
ing,  MCM  synthesis,  ASIC  synthesis,  software  compi¬ 
lation  and  performance  analysis,  and  various  waves- 
centered  tools  for  board,  MCM-  and  ASIC  level  testing. 


1  Overview 

COMET  (Cosynthesis  at  Board  and  MCM  Levels 
for  Digital  Signal  Processors)  is  a  hardware-software 
cosynthesis  environment  for  embedded  signal  process¬ 
ing  modules.  COMET  users  can  synthesize  single  board 
application-specific  DSP  (digital  signal  processing)  ar¬ 
chitectures.  These  target  architectures,  illustrated 
in  Figure  1,  can  contain  application-specific  asics, 
FPGAs,  MCMs,  and  off-the-shelf  hardware  components 
along  with  an  off-the-shelf  processor  which  executes 
applications-specific  software  as  well  as  other  kernel 
functions. 

The  users’  view  of  COMET  is  shown  in  Figure  2.  In  a 
typical  top-down  design  process,  COMET  users  begin 
by  writing  a  specification  of  the  system’s  functional 
requirements  and  constraints  in  VSPEC.  Then,  using 

*The  research  reported  in  this  paper  is  being  conducted  at 
the  University  of  Cincinnati  and  is  supported  in  part  by  by  the 
ARP  A  RASSP  program  monitored  by  the  Wright  Lab,  US  Air 
Force  under  contract  no.  F33615-93-C-1316  and  by  the  Solid 
State  Electronics  Directorate  of  the  Wright  Laboratory  of  the 
US  Air  Force  under  contract  number  F33615-91-C-1S11. 


the  hardware-software  partitioning  tool,  a  top-level 
hardware-software  architecture  is  generated.  The  par¬ 
titioning  tool  uses  a  library  of  reusable  components. 
Each  component  is  a  DSP  algorithm  bound  or  to  be 
bound  to  hardware  or  software.  The  component  li¬ 
brary  also  contains  a  set  of  off-the-shelf  processors. 
The  output  of  the  partitioning  tool  is  an  architecture 
of  hardware  and  software  components  whose  behav¬ 
ior  is  specified  in  procedural  VHDL.  In  addition,  the 
software  components  are  bound  to  an  off-the-shelf  pro¬ 
cessor  and  the  hardware  components  are  bound  to  var¬ 
ious  ASIC  and  packaging  technologies.  Hardware  com¬ 
ponents  in  the  design  are  submitted  to  hardware  syn¬ 
thesis  tools  and  the  software  components  to  software 
synthesis  tools.  An  interface  synthesis  tool  is  used 
to  synthesize  all  the  interface  logic  to  support  inter¬ 
component  hardware-software  communication  proto¬ 
cols.  An  architecture  integration  tool  composes  the 
various  components  into  a  coherent  board-level  design 
that  can  be  processed  by  commercial  board-level  place 
and  route  tools. 

COMET  environment  also  contains  test  generation 
tools  based  on  waves  and  performance  analysis  tools. 
comet  tools  are  also  interfaced  to  various  commercial 
and  university  tools,  especially  from  the  rassp  com¬ 
munity,  to  facilitate  simulation,  logic  synthesis,  syn¬ 
thesis  of  board-level  glue  logic  and  ASIC,  MCM  and 
board-level  layout  synthesis. 


2  VSPEC  Specification  Language 

VSPEC  is  a  declarative  annotation  language  for  VHDL 
entities.  Through  VSPEC  designers  specify  require¬ 
ments  the  system  design  should  meet  and  constraints 
on  its  implementation.  A  VSPEC  specification  consists 
of  a  collection  of  logical  statements  and  declarations 
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Figure  1:  COMET’S  Target  Architectures 


Figure  2:  COMET  Cosynthesis  Environment 


that  annotate  a  vhdl  entity  construct.  Consider  the 
following  entity  specification  of  a  multiplexor: 


entity  mux  is 

port  (  dO,  dl,  cntrl  :  in  bit; 
output  :  out  bit  ) ; 

end  mux; 

In  this  example,  the  entity  names  the  device  and  de¬ 
fines  input  and  output  ports.  However,  there  is  no 
indication  of  how  the  multiplexor  functions  or  what 
performance  constraints  it  must  adhere  to.  A  vhdl 
architecture  describes  the  behavior  or  structure  of 
an  entity.  Behavior  can  be  described  through  com¬ 
municating  and  concurrently  executing  sequential  pro¬ 
cesses.  Structure  can  be  described  through  component 
instantiation  and  interconnection.  VHDL  allows  the 
user  to  specify  the  behavior  of  a  system  by  defining 
a  single  artifact  (architecture)  embodying  that  be¬ 
havior.  Although  alternative  behaviors  may  be  spec¬ 
ified  by  multiple  architectures  of  the  same  entity, 
these  architectures  must  be  explicitly  enumerated. 
Therefore,  implementational  biases  occur  while  for¬ 
mulating  the  functional  requirements  since  the  user 
is  forced  to  commit  to  one  or  more  designs. 

The  VSPEC  language  was  developed  to  support  the 
definition  of  requirements  prior  to  the  specification 
of  designs,  vspec  has  constructs  to  allow  its  users 
to  declaratively  specify  input  pre-conditions,  output 
post-conditions,  state  variables,  constraints,  and  other 
requirements  at  the  entity  level.  The  following  is  a 
VSPEC  definition  of  a  simple  multiplexor: 

entity  mux  is 

port  C  dO,  dl,  cntrl  :  in  bit; 
output  :  out  bit  ) ; 

ensures 

output  =  ((dO  and  cntrl)  or 

(dl  and  not  cntrl)); 

constrained  by 
power  <  4  and 
size  <  20 
end  mux; 

This  vspec  entity  describes  the  interface  to  the  com¬ 
ponent  as  well  as  the  desired  function  and  constraints. 
The  ensures  clause  declaratively  states  the  function 
of  the  multiplexor.  This  definition  allows  many  differ¬ 
ent  implementations  to  be  developed  for  this  specifica¬ 
tion  as  long  as  the  specification  meets  the  requirement 
stated  here.  The  constrained  by  clause  specifies  con¬ 
straints  placed  on  the  power  and  area  of  the  entity. 
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The  VSPEC  interface  language  affects  only  the  VHDL 
entity  declaration.  Six  vspec  clauses  are  allowed  in 
the  entity: 

•  assumes  logical-expression; 

Describes  the  pre-conditions  that  must  be  met 
before  this  entity  can  be  used.  The  logi¬ 
cal-expression  is  defined  oyer  the  set  of  inputs  of 
the  device. 

•  ensures  logical-expression; 

Describes  post-conditions  that  must  be  true 
when  the  entity  functions  correctly.  The  logi¬ 
cal-expression  is  defined  as  a  relation  between  the 
inputs  of  the  device  and  its  outputs. 

•  constrained  by  logical-expression; 

Describes  the  constraints  placed  upon  the  entity. 
These  constraints  include  size,  timing,  heat  dissi¬ 
pation,  power  consumption  and  clock  speed.  The 
logical-expression  is  defined  over  pre-defined  vari¬ 
ables  representing  potential  constraints. 

•  state  typed-identifierJist ; 

A  list  of  typed  variables  used  to  store  the  state  of 
the  entity.  These  variables  maintain  their  values 
from  one  entity  invocation  to  the  next. 

•  modifies  identifier-list  ; 

List  of  variables  and  signals  this  entity  can  mod¬ 
ify.  All  elements  listed  must  be  defined  in  the 
state  clause  or  in  the  entity’s  port  declaration 
and  of  type  out,  inout,  or  buffer. 

•  VHDL -type  based  on  logical-expression; 

Associates  a  user  defined  VHDL  type  with  a  for¬ 
mal,  logical  definition.  This  allows  inferences  in¬ 
volving  user  defined  types. 

Architectures  in  VSPEC  A  general  architecture 
is  a  collection  of  interconnected  high  level  specifi¬ 
cations  that  serves  as  a  template  for  system  defi¬ 
nition.  The  general  requirements  of  each  compo¬ 
nent  are  known,  the  interaction  between  them  is 
known,  but  the  specifics  of  the  implementation  may 
not  be  known.  The  vhdl  architecture  construct 
supports  specification  of  interconnections  among  en¬ 
tities.  Whether  the  entity  structures  referenced  by 
the  architecture  have  associated  architectures  deter¬ 
mines  whether  there  are  just  requirements  or  designs 
associated  with  each  entity. 


Figure  3  shows  a  specification  of  a  batch  sequential 
sort  and  search  system.  The  entity  structures  associ¬ 
ated  with  each  component  in  batch-seq  are  specified 
using  VSPEC  with  no  specific  associated  algorithm. 
The  sort  component  npst  produce  a  sorted  output  and 
the  search  component  must  find  a  key  given  a  sorted 
input.  Algorithms  for  each,  perhaps  in  the  form  of 
behavioral  architectures,  must  be  specified  at  a  later 
time. 

VSPEC  Support  Environment  All  vspec  ex¬ 
pressions  translate  into  Refine  declarations.  These 
declarations  support  a  formal  inference  process,  exe¬ 
cutable  specifications  and  Refine  based  software  syn¬ 
thesis  tools.  REFINE  is  a  language  that  allows  pro¬ 
grammers  to  write  code  in  a  wide  range  of  styles.  This 
includes  high  level  constructs  such  as  sets  and  trans¬ 
formation  rules  down  to  more  traditional  procedural 
language  constructs  such  as  loops  and  if-then  state¬ 
ments  [1].  Refine  specifications  are  executable.  This 
allows  designers  to  test  their  system  at  a  very  early 
point  in  the  design  process. 


3  System  Performance  Estimation 

Accurate  performance  estimation  is  critical  to  the  suc¬ 
cess  of  a  design  synthesis  system.  The  COMET  perfor¬ 
mance  estimator  is  used  to  evaluate  the  performance, 
in  terms  of  area,  speed,  throughput  rate,  and  power 
dissipation,  of  the  library  components  as  well  as  the 
performance  of  a  contemplated  hardware-software  ar¬ 
chitecture  of  a  system.  The  estimator  can  be  used 
interactively  or  through  the  partitioning  engine  to  fil¬ 
ter  inferior  architectures  and  to  select  a  constraint- 
satisfying  hardware-software  binding  for  a  given  spec¬ 
ification.  As  shown  in  Figure  4,  various  hardware- 
software  alternatives  can  be  selected  for  each  compo¬ 
nent  in  the  architecture  and  for  each  selected  configu¬ 
rations  performance  envelopes  can  be  generated. 

Hardware  Performance  Estimation:  Perfor¬ 
mance  estimation  for  hardware  components  is  done 
by  detailed  analysis  of  the  operational  behavior  of 
the  component.  A  data-flow  control-flow  graph  (dfg) 
is  extracted  from  the  behavioral  specification.  The 
DFG  is  scheduled  across  control-steps  using  register 
level  hardware  modules  selected  from  a  module  li¬ 
brary.  From  this  scheduled  and  operator-bound  DFG 
accurate  estimates  of  area,  clock-speed  and  through¬ 
put  rate  are  made.  Estimation  of  power  consumption 
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entity  example  is 
port  (list:  in  array  of  element; 
k:  in  integer; 
output:  out  element); 
modifies  output; 
ensures 

(fa  e: element) 

(output  =  e)  <=> 

(e  in  input  and 
k  ■  key(e)) ; 
end  example; 


T«cfanotogy  S«U  ction 


architecture  batch-seq  of  example  is 
component  sorter  is 
sort 

port  (inlist:  in  array  of  element; 

out list:  out  array  of  element); 
component  searcher  is 
bin_ search 

port  (inlist:  in  array  of  element; 
value:  in  integer; 
return:  out  element) ; 

begin 

bl;  sorter 

port  map  (list,tmp); 
b2:  searcher 

port  map  (tmp,k,output) ; 
end  batch-seq; 

entity  sort  is 

port  (input:  in  array  of  element; 

output:  out  array  of  element); 
modifies  output; 
ensures 

bag  (output)  =  bag  (input)  and 
ordered ( output ) ; 
end  sort; 

entity  bin.search  is 

port  (input:  in  array  of  element; 
k:  in  integer; 
output:  out  element); 
modifies  output; 
assumes 

ordered (input) ; 
ensures 

(fa  e: element) 

(output  =  e)  <=> 

(e  in  input  and 
k  =  key(e) ) ; 
end  bin_.se arch; 


Figure  3:  Batch  sequential  architecture  for  finding  a 
value  in  a  list. 


Figure  4:  Performance  Estimation 


is  based  on  the  generation  of  profile  data  for  typical 
stimuli  of  the  component.  The  profile  data  is  used 
to  generate  estimates  of  switching  activity  in  the  final 
design.  Data  from  a  technology  library  that  contains 
both  ASIC  fabrication  and  packaging  technology  pro¬ 
files  is  used  to  generate  concrete  technology-dependent 
estimates  from  the  abstract  estimates.  Some  of  the 
hardware  performance  estimation  work  has  been  done 
as  part  of  the  mss  and  DSS  projects  [3,  2]. 


Software  Performance  Estimation  A  static  per¬ 
formance  evaluation  method  based  on  Isa  and  code 
models  is  being  developed  to  provide  estimates  of 
DSP  software  execution  time.  These  estimates  will  be 
used  to  guide  system  and  software  partitioning  such 
that  timing  constraints  can  be  satisfied  by  the  soft¬ 
ware  synthesis  algorithms.  Once  software  has  been 
created  and  compiled,  the  machine  code  is  evaluated 
to  assess  whether  timing  constraints  and  throughput 
requirements  have  been  satisfied.  The  static  perfor¬ 
mance  evaluation  method  consists  of  two  graph  the¬ 
oretic  models:  (1)  a  pipelined  instruction  execution 
time  (PIET)  model  which  is  accurate  to  the  clock  cycle 
level,  and  (2)  an  instruction  stream  execution  graph 
(iseg)  model.  The  PIET  model  is  constructed  for 
each  processor  with  a  unique  instruction  set  architec¬ 
ture  and  takes  into  account  all  data  path  dependen¬ 
cies  including  inter-instruction  dependencies  for  accu- 
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Figure  5:  Software  Performance  Evaluation 


rate  time  evaluation.  The  IS  EG  model  is  constructed 
for  each  software  program  being  analyzed  and  is  di¬ 
rectly  generated  from  the  machine  language  instruc¬ 
tion  stream.  The  ISEG  model  evaluates  all  data  and 
control  paths  within  the  instruction  stream  during  its 
formation. 

The  flow  of  activities  to  perform  static  performance 
evaluation  is  shown  in  Figure  5.  The  objective  is  to 
obtain  the  estimated  time  of  execution  between  any 
two  points  in  the  instruction  stream.  This  time  is 
obtained  as  an  aggregate  of  the  individual  operation 
times  of  each  instruction  in  the  instruction  stream 
given  the  PIET  model  of  the  ISA  of  the  target  proces¬ 
sor.  All  pipelined  activity  and  potential  hazards  are 
considered.  The  execution  of  each  successive  sequen¬ 
tial  instruction  is  evaluated  until  a  branch  instruction 
is  seen.  These  successive  sequential  instructions  are 
grouped  into  into  basic  blocks.  The  number  of  ma¬ 
chine  cycles  for  each  basic  block  is  determined  using 
the  PIET  graph.  The  ISEG  graph  is  created  as  a  stan¬ 
dard  control  flow  graph  where  basic  blocks  and  branch 
instructions  are  represented  as  nodes  in  the  gTaph. 
Edges  in  the  graph  represent  flow  of  control. 

The  determination  of  execution  time  between  any  two 
nodes  in  the  graph  proceeds  by  iteratively  reducing 
the  flow  network  between  the  two  nodes  until  the  two 
nodes  are  merged  into  a  single  node.  Each  reduction 


step  proceeds  by  first  examining  the  flow  network  and 
identifying  a  basic  structure  which  can  be  reduced, 
followed  by  computing  the  execution  time  of  basic 
structure  and  reducing  the  structure  to  a  single  node 
whose  label  is  the  derived  execution  time.  Branches 
and  loops  are  assessed  based  on  the  branch  taken/not 
taken  probabilities  which  are  in  turn  obtained  from 
the  benchmark  data  patterns  at  the  inputs  of  the  soft¬ 
ware  being  evaluated.  Note  that  this  data  is  usually 
expressed  in  worst-case  terms  if  worst-case  execution 
performance  estimates  of  the  software  is  desired.  If 
the  estimated  execution  time  of  the  entire  software 
program  is  desired,  the  entire  ISEG  graph  is  reduced 
to  one  node  by  the  graph  analysis  algorithm.  The  esti¬ 
mated  execution  time  can  then  be  compared  with  the 
timing  constraints  of  the  system  to  determine  if  the 
synthesized  software  satisfies  the  performance  goals. 

Reusual  Behavioral  Components  COMET  uses 
a  library  of  reusable  hardware,  software  or  unbound 
components  for  synthesis.  Performance  of  each  li¬ 
brary  component  is  characterized  using  the  same  per¬ 
formance  estimation  tools  described  above.  System 
synthesis  in  COMET  is  dependent  on  the  availability  of 
one  or  more  library  components  for  each  function  spec¬ 
ified  in  vspec.  If  a  vspec  function  in  a  specification 
has  no  corresponding  component  in  the  library,  then 
the  user  is  asked  to  supply  a  component  along  with  its 
operational  behavior  description  in  VHDL.  The  perfor¬ 
mance  of  the  description  for  various  target  hardware 
and  software  technologies  will  be  evaluated  using  the 
performance  estimation  and  the  component  along  with 
this  data  will  be  stored  in  the  library  for  later  use  (Fig¬ 
ure  6). 


4  System  Partitioning 

The  goal  of  system  partitioning  is  to  generate  a  first 
level  hardware-software  architecture  of  the  system  by 
partitioning  the  system  specification  into  specifica¬ 
tions  of  hardware  components  and  software  compo¬ 
nents.  The  hardware  components  will  be  further  pro¬ 
cessed  by  hardware  synthesis  tools.  The  software  com¬ 
ponents  will  be  bound  to  execute  on  a  selected  DSP  or 
general  purpose  processor  configuration.  The  hard¬ 
ware  and  software  components  will  be  connected  to 
constitute  a  vspec-vhdl  architectural  description  of 
the  system.  The  functional  requirements  and  con¬ 
straints  stated  in  the  vspec  specification  drive  the 
derivation  of  the  specific  hardware-software  mix.  Fig- 
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Figure  6:  Performance  Analysis  for  Library 
Components 


Figure  7:  System  Partitioning  in  COMET 


ure  7  shows  the  system  partitioning  tool  in  COMET. 

Initially,  the  VSPEC  system  specification  is  refined 
based  on  queries  into  the  design  library.  As  a  result 
of  the  queries,  components  are  selected  based  on  their 
ability  to  satisfy  the  system  function  and  constraint 
attributes.  In  case  the  existing  components  do  need 
meet  the  requirements,  a  design  that  partially  satisfies 
the  requirements  may  be  generated.  Alternatively,  the 
designer  may  be  queried  for  additional  components. 


5  Hardware  Synthesis 


comet  hardware  synthesis  system  consists  of  a  multi- 
component  partitioning  engine  and  a  set  of  synthesis 
tools  for  asic,  fpga  and  mcm  synthesis  (Figure  8). 


Multicomponent  Partitioning  Engine  The  par¬ 
titioning  engine  is  a  hierarchical  partitioning  and 
package  binding  tool  that  accepts  behavioral  specifi¬ 
cations  in  vhdl,  constraints  on  area,  power  consump¬ 
tion,  pin  counts,  speed  and  cost  and  generates  a  hier¬ 
archical  partition  of  the  specification  with  each  com¬ 
ponent  in  the  partition  bound  to  a  package  among 
a  set  of  available  packages.  The  partitioning  engine 
uses  a  back-tracking  algorithm  for, constraint-directed 
search.  Power  estimation  is  based  on  data  gathered 
by  dynamic  profiling  of  the  vhdl  specification  using 
typical  stimuli. 
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Figure  8:  Hardware  Synthesis  Flow  in  COMET 

High  Level  Synthesis  of  ASICs:  DSS  The  asic 
synthesis  system  DSS  (Distributed  Synthesis  System) 
accepts  behavioral  specifications  in  vhdl  and  con¬ 
straints  on  clock  period  and  area.  It  generates  register 
level  designs  in  VHDL.  Register  level  designs  contain 
two  parts:  a  data  path  and  a  finite  state  controller. 
The  data  path  is  in  the  form  of  structural  vhdl  in 
which  each  component  is  instantiated  from  a  prede¬ 
fined  parameterized  register  level  component  library. 
DSS  architecture  is  shown  in  Figure  9.  For  an  overview 
of  the  DSS  system,  see  [2], 

Register  level  designs  generated  by  DSS  can  be  pro¬ 
cessed  using  various  layout  synthesis  tools  including 
Lager  IV  and  Mentor  Graphics’  GDT  tools.  Figure  10 
shows  design  flow  using  the  DSS  system.  Test  vectors 
for  register-level  and  switch-level  simulations  are  auto¬ 
matically  created  using  a  test-bench  compiler.  Figure 
11  shows  the  design  a  processor  (Move  Machine)  gen¬ 
erated  by  DSS.  dss  has  been  used  to  generate  numer¬ 
ous  designs  including  some  industrial  strength  designs 
by  Texas  Instruments  [4]. 

MCM  Synthesis:  MSS  MCM  synthesis  environ¬ 
ment  MSS  [3]  is  embedded  in  COMET  to  facilitate  syn- 


Figure  9:  DSS  High  Level  Synthesis  System 


Figure  10:  ASIC  Synthesis  Using  DSS 
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Figure  11:  Move  Machine 


thesis  of  multichip  modules.  The  tools  in  the  mss 
environment  are  shown  in  Figure  12.  Behavior  spec¬ 
ifications  for  MSS  are  written  in  vhdl.  Performance 
descriptions  are  written  in  PDL  (Performance  Descrip¬ 
tion  Language)  [5,  6].  Multichip  designs  can  be  gener¬ 
ated  in  two  ways.  As  shown  in  Figure  12,  register  level 
designs  generated  by  DSS  can  be  partitioned  into  mul¬ 
tiple  chip  designs.  Alternatively,  as  shown  in  Figure 
13  an  integrated  behavior  synthesis  and  partitioning 
step  can  be  performed  to  obtain  multichip  designs  di¬ 
rectly.  These  multichip  designs  are  then  processed  by 
package  level  place  and  route  tools.  We  currently  use 
Mentor  Graphics  mcm  Station  and  plan  to  use  Harris 
EDA  Finnesse  system  in  near  future.  Figure  14  shows 
the  MCM  design  of  the  Find  processor  generated  using 
the  mss  tools. 


I.VK3L- 
Spwfltaatan,  POL 


Pwtnonmg 

AJgoma* 

H 

ll.IIUU.Mlli 

i 

— 

.  P«fflortng  Engtn*  . 

a  Cob  Furatorw 
[—In  POL 

ftj 

— 

ftogMirUi 

Ml  MCM 

!STG1 

— .  J 

Boundary -«can 

_J 

T*a 

“ i 

1  Beundary-8ean 

Lfcrary  (VHOL) 

n 

CO  mp4 

•r(TSC>  j 

[  Optona 

i 

|MCU  Dactgnwtm 

- 

»(VHCU 


6  Software  Synthesis 

The  software  synthesis  tools  in  comet  translates  dsp- 
based  software  behavioral  specifications  expressed  in 
a  subset  of  VHDLinto  efficient  machine  code  capable 
of  being  executed  in  a  multiprocessor  environment. 
The  overall  approach  to  software  synthesis,  shown  in 
Figure  15  is  to  translate  behavioral  descriptions  ex¬ 
pressed  in  VHDL  into  C  and  then  use  commercial  C 
compilers  to  translate  C  into  machine  code  to  execute 
on  the  target  processor.  In  this  way,  any  processor 
with  a  C  compiler  can  be  used  as  a  target.  The  cur¬ 
rently  supported  processors  are  the  Texas  Instruments 


Figure  12:  Multicomponent  Synthesis  System, 
MSS 
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Figure  13:  Partitioning  with  Synthesis  in  MSS 
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TMS430C51,  Sun  Microsystems  SPARC,  and  Intel 
80386.  As  explained  previously,  the  compiled  code 
can  be  statically  analyzed  for  timing  performance  to 
ensure  compliance  with  timing  constraints  expressed 
in  the  vspec  specification. 

The  VHDL  subset  used  as  input  for  software  synthe¬ 
sis  is  similar  to  that  used  for  ASIC  synthesis  [2].  VHDL 
behavioral  constructs  are  fully  supported  along  with  a 
limited  subset  of  structural  constructs.  Explicit  tim¬ 
ing  ,  such  as  VHDL  after  clauses  or  specific  time  in 
wait  statements,  is  not  supported. 

Translation  into  C  is  a  straightforward  process.  The 
code  generator  is  encapsulated  in  template  functions 
to  allow  future  extensions  to  languages  other  than  C. 
For  example,  the  code  generator  objects  can  be  easily 
changed  to  output  Ada  source  code  strings  rather  than 
C  source  code  strings. 

The  execution  environment  consists  primarily  of  a 
small  multitasking  operating  system  kernel  which 
will  provide  interprocess  communication  service,  task 
management,  and  input/output  support.  The  task 
scheduler  will  create,  maintain  and  monitor  all  tasks 
in  the  run-time  space,  while  the  interprocess  commu¬ 
nication  protocol  will  support  a  simple  message  pass- 


Figure  14:  Find  MCM 


Figure  15:  Software  Synthesis  Flow  in  COMET 
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ing  mechanism  where  a  process  writes  its  request  and 
data  in  a  message  channel  whenever  it  tries  to  com¬ 
municate  with  others,  and  then  optionally  waits  until 
a  response  is  received.  The  I/O  drivers  will  provide 
a  simple  stream  capability  with  support  for  objects  of 
arbitrary  width. 


7  Test  Tools  in  COMET  and  MSS 

COMET  and  MSS  contain  various  tools  for  the  testing 
and  simulation  of  designs  as  the  design  process  pro¬ 
gresses.  Designs  from  behavioral  level  to  gate  level 
are  expressed  in  VHDL;  any  vhdl  simulator  can  be 
used  to  simulate  these  designs.  Test  vectors  are  au¬ 
tomatically  generated  at  various  levels  of  abstraction. 
These  test  generation  tools  take  waves  files  as  input 
and  generate  waves  files  as  output.  As  shown  in  Fig¬ 
ure  12,  at  the  behavior  level,  the  users  write  waves 
data  sets  to  simulate  behavioral  descriptions.  A  mul¬ 
ticomponent  test-bench  compiler  translates  data  sets 
into  individual  waves  data  sets  for  each  of  the  chips 
in  the  multichip  design.  The  data  set  also  contains 
expected  responses  so  that  automatic  comparison  be¬ 
tween  expected  and  actual  responses  can  take  place. 
Switch  level  simulation  is  facilitated  by  a  switch-level 
test-bench  compiler  that  generates  switch-level  tests 
from  waves  data  sets. 

In  addition  to  the  automatically  generated  tests,  users 
can  add  additional  tests  to  the  waves  data  sets.  To 
aid  users  in  this  process,  waves  usages  guide  for  mul¬ 
ticomponent  designs  addressing  both  waves  Level  1 
and  Level  2  constructs  are  being  developed  [7,  8], 


8  Conclusion 

COMET  design  environment  is  under  development  as 
part  of  the  rassp  program.  MSS  and  dss  systems 
have  been  operational  for  over  two  years;  their  de¬ 
velopment  has  been  funded  separately  by  Solid  State 
Electronics,  Wright  Lab  and  ARPA.  COMET  tools  sig¬ 
nificantly  advance  the  state  of  the  art  in  automated 
and  vertically  integrated  synthesis  systems.  Various 
tools  in  the  COMET  cosynthesis  system  are  interfaced 
with  other  commercial  and  university  tools  within 
the  RASSP  community  and  produce  design  and  test 
files  in  standard  notations  such  as  vhdl  and  waves. 
Through  the  use  of  the  vspec  notation,  the  COMET 
environment  supports  design  synthesis  from  abstract, 


declarative  specifications  of  board  and  MCM  level  dig¬ 
ital  signal  processing  architectures. 
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Abstract 

VHDL  allows  a  designer  to  describe  a  digital  system  by  specifying  a  specific  design  artifact 
.that  implements  the  desired  behavior  of  the  system.  However,  the  operational  style  used  by 
VHDL  forces  the  designer  to  make  design  decisions  too  early  in  the  design  process.  In  addition, 
there  is  no  means  for  specifying  non-functional  performance  constraints  such  as  heat  dissipa¬ 
tion,  propagation  delay,  clock  speed,  power  consumption  and  layout  area  in  standard  VHDL. 
Thus,  vhdl  is  not  appropriate  for  high  level  requirements  representation.  VSPEC  is  a  Larch-like 
requirements  language  used  with  vhdl  that  solves  these  problems.  VSPEC  adds  seven  clauses 
to  the  vhdl  entity  structure  that  allow  a  designer  to  declaratively  describe  the  data  transfor¬ 
mation  a  digital  system  should  perform  and  performance  constraints  the  system  must  meet. 
The  designer  axiomatically  specifies  the  transformation  by  defining  predicates  over  entity  ports 
and  system  state  describing  input  precondition  and  output  postconditions.  A  constraints  sec¬ 
tion  allows  the  user  to  specify  timing,  power,  heat,  clock  speed  and  layout  area  constraints.  In 
combination  with  the  architecture  declaration,  collections  of  VSPEC  specified  components  can 
define  a  high  level  architecture  as  interconnected  collections  of  components  where  requirements 
of  components  are  known  (via  a  vspec  description),  but  implementations  are  not.  This  work 
presents  the  vspec  language  and  associated  design  methodology. 


1  Introduction 

vspec  is  a  language  for  declaratively  specifying  digital  systems.  It  annotates  the  hardware  descrip¬ 
tion  language  VHDL  by  adding  seven  new  clauses  to  the  entity  construct.  These  clauses  allow  a 
digital  system  to  be  specified  using  a  declarative  style  as  opposed  to  the  operational  style  of  vhdl. 
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With  vhdl  alone,  the  only  way  to  specify  a  digital  system  is  by  describing  a  specific  design  artifact 
that  implements  the  system’s  desired  behavior.  On  the  other  hand,  vspec  allows  the  designer  to 
describe  the  function  of  the  system  without  defining  the  eventual  implementation.  In  short,  vspec 
allows  the  specification  of  “what”  a  system  should  do  as  opposed  to  the  vhdl  description  of  “how” 
the  system  will  do  it.  This  is  consistent  with  Hoare’s  definition  of  specifications  [9]. 

In  addition  to  allowing  the  specification  of  “what”  instead  of  “how”,  vspec  addresses  another 
limitation  of  VHDL:  specifying  performance  constraints.  When  designing  a  digital  system,  meeting 
certain  non-functional  (i.e.  performance)  constraints  is  equally  as  important  as  creating  a  system 
that  functions  properly.  A  flight  control  system  so  slow  that  it  calculates  a  flight  correction  after  the 
plane  crashes  is  obviously  inadequate.  Since  they  are  so  important  in  digital  systems,  performance 
constraints  should  be  specified  very  early  in  the  design  process.  However,  VHDL  does  not  provide 
a  consistent  mechanism  for  specifying  these  types  of  constraints.  VSPEC  addresses  this  problem 
by  allowing  the  designer  to  specify  performance  constraints  such  as  heat  dissipation,  propogation 
delay,  clock  speed,  power  consumption  and  layout  area. 

Another  way  of  viewing  vspec  is  as  a  Larch  style  interface  language  for  vhdl.  The  Larch  family 
of  specification  languages  supports  a  two-tiered,  model-based  approach  to  specifying  programs  [7]. 
A  Larch  specification  consists  of  components  written  in  two  languages:  an  Interface  Language 
and  the  Larch  Shared  Language.  Interface  languages  are  used  to  specify  the  interfaces  between 
program  components,  including  component  inputs  and  outputs  as  well  as  the  observable  behavior 
of  the  component.  Interface  languages  exist  for  a  variety  of  programming  languges,  including  C  [6], 
C++  [2],  Modula-3  [12]  and  Ada  [5]. 

Definitions  written  in  the  Larch  Shared  Language  (lsl)  are  the  second  component  of  a  Larch 
specification.  LSL  is  a  formal  algebraic  language  that  defines  the  underlying  sorts  and  operators 
used  in  the  Larch  Interface  Languages  [8,  3].  In  the  vspec  system,  Refine  [17]  is  the  primary 
shared  language.  Refine  is  a  language  that  contains  a  wide  range  of  constructs,  from  high-level 
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sets  and  transformations  down  to  more  traditional  loops  and  conditional  statements.  All  vspec 
clauses  can  be  translated  into  a  Refine  representation.  There  are  two  main  reasons  Refine  was 
chosen  as  the  primary  shared  language  for  VSPEC. 

First,  lsl  specifications  are  not  executable.  Since  Refine  is  a  broad  spectrum  programming 
language,  some  VSPEC  specifications  are  executable.  This  is  a  very  important  feature  for  a  digital 
system  specification  language  such  as  vspec.  vhdl  descriptions  of  digital  system  are  simulated  as 
early  as  possible  in  the  design  cycle  so  that  bugs  can  be  found  when  they  are  the  least  expensive 
to  fix.  This  same  concept  extends  to  a  vspec  requirements  specification  of  a  system.  The  sooner  a 
bug  in  the  requirements  specification  is  found,  the  less  expensive  it  is  to  fix.  One  way  that  problems 
with  the  specification  can  be  found  at  the  earliest  possible  point  in  the  design  cycle  is  by  executing 
the  specification. 

The  second  reason  Refine  was  chosen  as  the  primary  shared  language  is  that  it  supports  synthesis  of 
behavioral  vhdl  from  vspec.  Refine  is  one  part  of  a  suite  of  software  synthesis  tools.  Supporting 
synthesis  of  behavioral  vhdl  from  vspec  is  one  of  the  main  long  term  goals  of  this  research. 

vspec  is  one  part  of  the  comet  research  project.  The  goal  of  this  project  is  to  develop  better 
techniques  for  rapid  prototyping  of  digital  signal  processing  systems.  A  detailed  description  of 
COMET  is  beyond  the  scope  of  this  paper  [22],  but  as  the  project  overview  in  Figure  1  shows,  a 
comet  user  begins  by  writing  a  description  of  the  function  and  constraints  of  the  system  in  vspec. 
This  description  is  then  used  to  partition  the  system  into  hardware  and  software  components  with 
an  architecture  for  connecting  these  pieces  together.  Each  of  these  components  is  synthesized  and 
integerated  into  a  board  level  implementation  of  the  system  that  is  simulated  and  verified  against 
the  original  specification. 

The  remainder  of  this  paper  gives  a  more  detailed  description  of  vspec.  The  next  section  briefly 
describes  the  vhdl  constructs  that  are  important  in  vspec.  Section  3  gives  a  detailed  description 
of  each  of  the  seven  vspec  clauses.  This  is  followed  by  an  extended  example  where  VSPEC  is  used  to 
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Figure  1:  Overview  of  the  COMET  project. 

specify  a  small  microprocessor.  Following  this  is  a  section  that  describes  the  formal  representation  of 
VSPEC.  Section  6  discusses  other  work  related  to  VSPEC  and  the  paper  concludes  with  a  description 
of  the  current  status  and  future  directions  for  this  research. 

2  Important  VHDL  Constructs 

This  section  gives  a  very  brief  description  of  two  of  the  vhdl  constructs  used  in  VSPEC.  It  contains 
enough  information  to  explain  why  the  vspec  annotations  are  needed  in  a  specification  language 
for  digital  systems.  For  a  more  complete  description  of  vhdl,  refer  to  the  VHDL  language  reference 
manual  [10]  or  a  textbook  on  vhdl  [16].  If  you  are  already  familar  with  vhdl,  you  can  skip  this 
section  and  begin  reading  about  the  vspec  clauses  described  in  Section  3. 

Two  of  the  more  important  constructs  in  VHDL  are  entities  and  architectures.  A  vhdl  entity 
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declares  a  digital  component  by  defining  the  component’s  interface.  The  function  of  the  component 
is  not  defined  in  the  entity  structure.  Instead,  each  entity  has  one  or  more  associated  architectures 
where  the  function  of  the  component  is  described.  This  is  the  “big  picture”  of  how  entities  and 
architectures  are  used.  The  next  few  paragraphs  give  a  more  detailed  description  of  each  of  these 
constructs,  starting  with  the  syntax  for  a  vhdl  entity: 


( entity-declaration )  ::=  ENTITY  ( identifier )  IS 
{entity-header) 

{ entity-declaralive.part ) 

[  BEGIN 

{ entitystatement-part )  ] 

END  [{Entity simple-name)]; 


The  most  important  portion  of  the  entity  declaration  is  the  entity  header.  The  only  part  of  the  entity 
header  currently  used  in  vs  PEC  is  the  port  clause.  A  port  clause  defines  the  inputs  and  outputs  of 
the  component.  Here  is  an  example  entity  declaration  for  a  simple  two  input  multiplexor: 


ENTITY  vhdl_mux  IS 

PORT  (  DO,  Di,  cntrl  :  IN  BIT; 
output  :  OUT  BIT  ); 

END  vhdl_mux; 


Notice  that  this  entity  merely  defines  the  types  of  the  inputs  and  outputs  to  the  multiplexor.  It 
does  not  contain  any  description  of  the  function  of  the  entity. 

The  function  of  the  multiplexor  is  described  in  the  VHDL  architecture.  Each  entity  has  one  or  more 
associated  architectures.  An  architecture  is  used  to  define  the  behavior  of  a  specific  implementation 
of  an  entity.  The  syntax  of  the  architecture  construct  is  as  follows: 


{architecture-body)  ::=  ARCHITECTURE  {identifier)  OF  {Entity.name)  IS 
{ architecture-declarative  .part ) 
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BEGIN 

{ architecture.statement.part ) 

END  [  (Architecture.simple.name)  ]; 

Detailed  descriptions  of  each  portion  of  the  architecture  are  beyond  the  scope  of  this  document 
(see  [10, 16]).  Suffice  it  to  say  that  the  declarative  part  of  the  architecture  defines  the  types,  signals 
and  components  used  by  the  architecture  while  the  statement  part  defines  the  behavior  or  structure 
of  the  entity.  Consider  the  following  architecture  for  the  multiplexor  entity  above: 

ARCHITECURE  behavior  OF  vhdl_mux  IS 
BEGIN 

PROCESS  (  DO,  Dl,  cntrl  ) 

BEGIN 

IF  cntrl  =  0  THEN  output  <=  DO; 

ELSE  output  <=  Dl; 

END  PROCESS 
END  behavior; 

This  is  an  example  of  a  behavioral  architecture.  Behavioral  architectures  use  ADA-like  program¬ 
ming  constructs  to  define  the  function  of  an  entity.  In  this  simple  example,  an  if-then  statement  is 
used  to  assign  a  value  (<=  is  used  for  signal  assignment)  to  the  output  port  based  on  the  value  of 
cntrl.  Although  this  is  a  simple  example,  behavioral  architectures  can  be  quite  complex.  Auxil¬ 
iary  procedures  and  functions  can  be  written  in  the  declarative  part  of  the  architecture  and  entire 
packages  of  library  routines  can  be  used  within  the  architecture.  With  these  auxiliary  procedures 
and  packages,  a  behavioral  architecture  can  be  defined  using  a  large  program.  No  matter  what  size, 
all  behavioral  architectures  have  one  thing  in  common:  they  define  a  single  implementation  of  the 
behavior  of  an  entity. 

Structural  architectures  are  the  second  common  type  of  vhdl  architectures.  This  architecture  type 
defines  the  subcomponents  an  entity  is  composed  of  and  how  those  subcomponents  are  connected. 
For  example,  the  behavior  of  the  multiplexor  could  also  be  defined  using  and,  or  and  not  gates 
connected  as  shown  in  Figure  2.  In  vhdl,  this  is  represented  using  the  following  architecture: 
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ARCHITECTURE  structure  OF  vhdl_mux  IS 

COMPONENT  ancLgate  PORT  (ini,  in2  :  IN  BIT;  output  :  OUT  BIT); 

END  COMPONENT; 

COMPONENT  or_gate  PORT  (ini,  in2  ;  IN  BIT;  output  :  OUT  BIT); 

END  COMPONENT; 

COMPONENT  not_gate  PORT  (input  :  IN  BIT;  output  :  OUT  BIT); 

END  COMPONENT; 

SIGNAL  Dorset,  Dl_set,  cntrl_prime  :  BIT; 

BEGIN. 

and_l  :  and_gate  PORT  MAP  (inl=>DO,  in2=>cntrl,  output=>DO_set) ; 
and_2  ;  and_gate  PORT  MAP  (inl->Dl,  in2=>cntrl_prime ,  output =>Dl_set) ; 
not_l  :  not.gate  PORT  MAP  ( input => cut rl,  output=>cntrl_prime) ; 
or_l  :  or_gate  PORT  MAP  (inl=>DO_set ,  in2=>Dl_set,  output =>output) ; 

END  structure; 


In  this  example,  the  declarative  part  of  the  architecture  defines  three  components  and  three  signals. 
The  component  declarations  (and_gate,  or_gate  and  not-gate)  define  the  inputs  and  outputs  of 
three  sub-components  that  will  be  used  in  this  architecture.  The  behavior  and/or  structure  of  these 
three  sub-components  must  be  defined  by  an  entity /architecture  pair  somewhere  else  in  the  system 
(not  shown  here).  Another  VHDL  construct,  the  configuration,  is  used  to  map  components  to  the 
the  entity/architecture  pair  that  define  the  behavior  of  the  component.  The  three  signals  declared 
(D0_set,  Dl_set  and  cntrl_prime)  are  used  to  connect  these  three  components  together  as  shown 
in  Figure  2. 

Instances  of  each  of  the  components  in  the  architecture’s  declarative  part  are  created  in  the  state¬ 
ment  part  (between  begin  and  end).  The  port  map  for  each  instance  shows  how  that  particular 
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component  instance?  is  connected  to  the  signals  in  the  architecture. 

Although  this  example  is  very  small,  the  same  basic  concepts  defined  here  scale  to  much  larger  sys¬ 
tems.  This  multiplexor  could  be  part  of  an  ALU  which  is  a  sub-component  of  a  large  microprocessor 
which  is  itself  one  component  on  a  board  level  system.  The  same  type  of  structural  architecture  is 
used  to  connect  the  system  together  at  each  of  these  levels.  The  lowest  level  (the  and,  or  and  not 
gates  in  this  example)  contains  a  behavioral  description  of  the  components.  Because  vspec  is  an 
extension  of  vhdl,  these  features  for  dealing  with  large  systems  are  also  found  in  vspec. 


3  The  vspec  Clauses 


The  vspec  language  annotates  vhdl  by  adding  seven  new  clauses  to  the  entity  structure.  The 
modified  syntax  for  the  entity  structure  becomes: 


(. entity-declaration )  ENTITY  (identifier)  IS 

( entity-header ) 

( vspec.clauseJist ) 

( entity.declarative.part) 

[  BEGIN  ] 

END  [(Entity.simple.name)]’, 


The  only  change  made  to  the  vhdl  syntax  was  the  addition  of  the  optional  vspec  clause  list  to  the 
entity  declaration.  1  All  other  constructs  remain  intact.  A  vspec  clause  list  is  a  list  of  the  seven 
vspec  clauses  separated  by  commas: 


( vspec.clauseJist )  ::=  (vspec.clau$e)  {  ;  (vspec.clause)  }  ; 


JThis  statement  is  not  completely  accurate  since  VHDLs  expression  syntax  was  also  extended  to  include  quantifiers, 
logical  implication  and  support  for  sets  and  sequences.  This  is  described  in  a  little  bit  more  detail  in  the  VSPEC 
Language  Reference  Manual  [13]. 
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(vspec-clause)  ::=  [  { requires-clause )  ]  |  [  { ensuresMause )  ]  |  [  { state.clause )  ]  | 

[  { con$trainedJ>y-clause )  )  |  [  { modifies-clause )  ]  \  [  ( based-on-clause )  ]  j  [  ( includes-clause )  ) 

These  vspec  clauses  can  be  grouped  into  four  broad  classes.  The  first  class  defines  the  function 
of  the  entity  and  includes  the  requires  and  ensures  clauses.  The  next  class  declares  the  internal 
state  of  the  entity  in  the  state  clause.  The  third  type  of  vspec  clause  is  used  to  define  the 
constraints  placed  on  the  system.  The  constrained  by  and  modifies  clauses  fall  into  this  category. 
Finally,  the  includes  and  based  on  clauses  are  used  to  help  map  the  vspec  definition  to  its  formal 
representation  in  Refine.  These  are  the  only  two  clauses  that  can  appear  more  than  once  in  a 
VSPEC  clause  list.  The  following  sub-sections  describe  each  of  these  clauses  in  a  little  bit  more 
detail. 

3.1  Requires  Clause 

( requires-clause )  ::=  REQUIRES  { logical-expression )  ; 

The  requires  clause  states  the  pre-condition  for  the  entity.  If  the  entity’s  inputs  and  current  state 
make  the  requires  logical  expression  true,  then  the  entity  is  guaranteed  to  perform  its  specified 
function.  The  behavior  of  the  entity  is  undefined  if  the  requires  clause  is  false.  A  designer  that 
uses  an  entity  specified  with  VSPEC  must  ensure  that  the  requires  logical  expression  is  true  before 
the  entity  is  used.  Consider  the  following  example: 


ENTITY  search  IS 

PORT  (input  :  IN  ARRAY  OF  record_type; 
key  :  IN  INTEGER; 
output  :  OUT  record_type) 

REQUIRES  s orted( input ) ; 

ENSURES  element^of (output,  input)  AND  output .keyval  =  key; 
INCLUDES  Msort.re" ,  Mset.re" ; 

END  search; 
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In  this  example,  sorted  is  a  function  defined  in  the  file  “sort.re”  (see  description  of  includes  clause 
in  Section  3.6)  that  returns  true  if  the  array  passed  in  is  in  order  and  false  otherwise.  The  search 
entity  above  will  only  function  properly  if  the  input  array  is  sorted.  If  the  input  is  not  in  order, 
the  function  of  search  is  undefined.  The  function  of  all  entities  is  undefined  if  the  requires  clause 
is  false.  For  this  reason,  it  is  best  to  keep  the  pre-conditions  expressed  in  the  requires  clause  as 
simple  as  possible.  The  more  conditions  that  must  be  met  for  the  requires  clause  to  be  true  (i.e. 
the  more  complex  the  pre-condition),  the  more  difficult  it  will  be  to  meet  the  pre-condition  and  use 
the  entity.  Thus,  the  pre-condition  should  be  kept  as  simple  as  possible.  A  pre-condition  of  true 
implies  the  entity  has  no  pre-condition.  It  must  function  properly  on  all  input  values. 

One  portion  of  the  requires  clause  definition  has  been  kind  of  ignored  to  this  point:  What  is  a 
logical  expression?  All  logical  expressions  in  the  vspec  clauses  use  a  syntax  that  is  an  extension 
of  VHDL.  The  vhdl  expression  syntax  supports  the  standard  boolean  expressions  and,  or  and  not. 
VSPEC  extends  this  syntax  by  adding  constructs  for  variable  quantification  and  logical  implication. 
In  addition,  the  VSPEC  expression  syntax  includes  constructs  for  sets  and  sequences.  See  the  VSPEC 
Language  Reference  Manual  [13]  for  a  more  detailed  description  of  the  syntax  of  VSPEC  expressions. 

3.2  Ensures  Clause 

( ensures.clause )  ::=  ENSURES  ( logical-expression )  ; 

The  ensures  clause  states  the  post-condition  of  the  entity.  A  designer  implementing  an  entity 
specified  with  vspec  must  ensure  that  this  logical  expression  is  true  whenever  the  entity  processes 
valid  input  (i.e.  input  that  makes  the  requires  logical  expression  true).  Consider  the  following 
example: 


ENTITY  vhdl.mux  IS 

PORT  (  DO,  Dl,  cntrl  :  IN  BIT; 
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output  :  OUT  BIT  ) ; 

REQUIRES  true; 

ENSURES  output  =  (DO  AND  cntrl)  OR  (D1  AND  (NOT  cntrl)); 

END  vhdl_mux ; 

This  is  a  VSPEC  description  of  the  two  input  multiplexor  specified  in  Section  2.  The  requires 
clause  states  that  this  entity  is  guaranteed  to  work  for  all  legal  values  of  the  input  varaibles.  The 
logical  expression  in  the  ensures  clause  declaratively  specifies  the  function  of  the  entity.  The  logical 
expression  is  a  condition  that  must  be  true  when  the  entity  functions  properly.  Thus,  the  ensures 
logical  expression  describes  the  functional  requirements  of  the  entity. 

For  this  simple  multiplexor  example,  the  differences  between  a  VHDL  behavioral  description  and 
VSPEC  may  not  seem  that  significant.  For  a  more  telling  example,  consider  the  specification  of  a 
sorting  component.  In  VHDL,  the  simplest  way  to  speicify  a  sorter  is  an  entity  with  a  behavioral 
architecture  describing  its  function.  This  behavioral  architecture  would  be  an  ADA-like  description 
of  a  specific  sorting  algorithm  such  as  bubble  sort  or  quicksort.  This  forces  the  design  of  the 
component  to  a  specific  implementation  at  a  very  early  stage  in  the  design  process.  In  reality,  this 
behavioral  architecture  is  a  description  of  “how”  the  sorter  should  work,  not  “what”  the  sorter 
should  do.  It  biases  the  implementation  towards  a  specific  design  (i.e.  a  bubble  sort  or  quicksort) 
and  forces  a  designer  to  deal  with  unneccessary  detail  at  a  very  early  point  in  the  design  process. 

On  the  other  hand,  a  sorting  component  could  be  described  in  VSPEC  like  this: 

ENTITY  sorter  IS 

PORT  (  input  :  IN  ARRAY  OF  INTEGER; 

output  :  OUT  ARRAY  OF  INTEGER  ); 

REQUIRES  true; 

ENSURES  permutation (output,  input)  AND 
sorted (output) ; 

INCLUDES  "sort .re" ; 

END  sorter; 

In  this  example,  permutation  is  a  function  (defined  in  “sort.re”)  that  returns  true  if  output 
contains  all  the  same  elements  as  input  while  sorted  is  the  same  function  used  in  Section  3.1. 
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This  code  describes  a  sorting  component  as  something  that  ensures  input  and  output  contain  the 
same  elements  and  that  output  is  in  order.  Thus,  the  specification  above  describes  the  functional 
requirements  of  the  sorter  without  describing  an  implementation  of  a  sorting  algorithm.  In  other 
words,  this  definition  describes  “what”  the  sorter  must  do  instead  of  defining  “how”  it  should  be 
done,  vhdl  alone  does  not  allow  this  type  of  description.  The  VSPEC  ensures  and  requires  clause 
add  this  feature  to  vhdl. 

3.3  State  Clause 

( state-clause )  ::=  STATE  ( vspec.variable.declarationJist )  ; 

The  purpose  of  the  state  clause  is  to  define  a  list  of  variables  that  store  the  state  of  an  entity. 
In  most  algebraic  specification  languages  (such  as  Larch  [7]),  a  computational  unit  is  defined  as  a 
transformation  from  inputs  to  outputs.  This  type  of  transformation  is  not  adequate  for  specifying 
systems  with  vspec.  Unlike  typical  subprograms,  an  entity’s  local  storage  is  not  re-initialized  for 
each  use  of  the  entity.  Buffers  and  registers  retain  their  values  from  one  use  of  the  entity  to  the 
next.  The  state  clause  provides  a  means  to  model  this.  The  variables  declared  in  the  state  clause 
serve  as  the  local  storage  for  the  entity.  In  addition,  hardware  designers  very  naturally  think  in 
terms  of  the  state  of  a  device  and  the  state  clause  allows  them  to  extend  this  thought  process  to 
the  specification  of  the  digital  system. 

The  syntax  for  a  vspec  variable  declaration  list  is: 

(vspec-variable-declarationJist)  ::=  ( vspec-variable-declaration )  {,  ( vspec-variable.declaration )  } 

( vspec.variable.declaration )  ::=  ( identifier-list )  :  ( subtype-indication ) 

An  identifier  list  is  a  comma-separated  list  of  identifiers  while  a  subtype  indication  is  the  vhdl 
construct  used  to  declare  the  type  of  a  variable.  In  most  cases,  this  is  just  an  identifier  that  names 
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the  type  of  the  variable(s)  declared,  but  refer  to  the  vhdl  documentation  for  a  more  complete 
description  [10,  16]. 

3.4  Constrained  By  Clause 

( constrained-by.clause )  ::=  CONSTRAINED  BY  { logical-expression )  ; 

While  the  ensures  clause  is  used  to  describe  the  functional  requirements  placed  on  a  system,  the 
constrained  by  clause  is  used  to  describe  the  performance  requirements  of  the  system.  Consider 
the  affect  of  adding  the  following  clause  to  the  sorter  example  in  Section  3.2: 

CONSTRAINED  BY 

size  <=  2  um  *  5  urn  AND 
,  power  <=  20  mV  AND 
input<->output  <=  100  us; 

With  this  additional  clause,  the  vs  PEC  entity  now  supplies  information  about  the  area  the  entity 
must  be  implemented  in,  the  maximum  power  consumption  of  the  entity  and  the  pin  to  pin  timing 
for  the  entity,  vhdl  does  not  provide  a  convenient  way  to  specify  these  types  of  performance  con¬ 
straints.  The  constrained  by  clause  provides  a  standard  method  for  specifying  the  non-functional 
requirements  of  the  system. 

The  logical  expression  used  in  the  constrained  by  clause  must  be  a  conjunction  of  constraint 
expressions.  The  syntax  for  these  expressions  is: 

(constraint-expression)  ::=  (constraint-type)  (relational.op)  (constraint-value) 

where  the  relational  operators  are  the  standard  vhdl  operators  <=,  <,  >=,  >,  =  and  /=  (not 
equal)  and  the  constraint  value  is  either  a  physical  literal  or  a  product  of  two  physical  literals  (i.e. 
10  um  *  40  um).  In  VHDL,  a  physical  literal  is  simply  a  number  followed  by  a  unit  (10  mW,  for 
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example).  Each  constraint  expression  restricts  the  legal  value  of  the  constraint  type  to  a  given 
range,  for  instance  power  <  1  F. 

vspec  currently  recognizes  five  constraint  types:  area,  heat  dissipation,  power  consumption,  clock 
frequency  and  pin  to  pin  timing.  In  a  constraint  expression,  the  first  four  of  these  constraint 
types  are  referenced  with  an  identifier.  Respectively,  these  identifiers  are  area,  heat,  power  and 
clock-frequency.  A  slightly  different  notation  is  used  to  specify  the  final  constraint  type,  pin  to 
pin  timing.  The  syntax  for  this  type  of  constraint  is: 

( timing-expression )  ::=  ( input.pin )  <->  (outpuLpin) 

where  input  pin  and  output  pin  are  identifiers  that  represent  an  input  and  an  output  port  of  the 
entity.  Thus,  an  expression  such  as  input  <—>  output  <  100  us  states  that  a  change  in  the  data 
at  the  input  port  is  propogated  to  the  output  port  in  less  than  100  microseconds. 

As  mentioned  above,  constraint  values  are  either  a  physical  literal  or  the  product  of  two  physical 
literals.  Area  is  the  only  constraint  type  where  a  constraint  value  is  the  product  of  two  physical 
literals.  Area  must  be  specified  in  this  fashion  with  the  two  values  representing  the  bounding  box 
that  the  entity  must  fit  into.  All  other  constraint  types  have  values  that  are  physical  literals. 

There  are  several  predefined  units  that  are  used  for  constraint  values  in  VSPEC.  The  base  units  of 
these  predefined  units  are  meters  for  area,  volts  for  power  consumption,  hertz  for  clock  frequency 
and  seconds  for  pin  to  pin  timing.  In  addition  to  these  base  units,  each  of  these  units  can  also 
be  expressed  using  the  standard  metric  prefixes  (i.e.  area  could  be  fm,  um,  mm,  cm,  m  or  km). 
VHDLalso  allows  the  declaration  of  virtually  any  other  physical  type  (see  physical  type  definition  in 
a  vhdl  reference  [10,  16]). 

In  addition  to  the  five  pre-defined  constraints,  VSPEC  users  can  create  their  own  constraint  types. 
At  the  present  time,  this  has  not  been  implemented  in  the  VSPEC  system,  but  this  functionality  is 
a  part  of  the  overall  plan  for  the  language. 
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3.5  Modifies  Clause 

(modifies-clause)  MODIFIES  ( identifierJist )  ; 

The  modifies  clause  is  used  to  help  build  a  list  of  signals  and  variables  the  entity  will  modify.  The 
entity  is  guaranteed  to  change  only  the  signals  in  this  modifies  list.  The  value  of  all  other  signals  in 
the  entity  will  be  left  unchanged.  Since  out  mode  port  signals  and  all  variables  in  the  state  clause 
would  serve  no  purpose  if  the  entity  did  not  change  them,  all  out  mode  port  signals  and  variables 
in  the  state  clause  are  automatically  included  in  the  modifies  list.  You  may  explicitly  write  them 
in  the  identifier  list  in  the  modifies  clause  if  you  desire,  but  this  is  an  unneccessary  step.  On  the 
other  hand,  global  variables  2  and  buffer/inout  mode  port  signals  may  only  be  modified  if  they  are 
included  in  the  modifies  list.  It  is  an  error  to  place  in  mode  port  signals  in  the  modifies  list  since 
the  definition  of  vhdl  does  not  allow  an  entity  to  change  the  value  of  an  input  signal.  Here  is  a 
simple  example  to  clarify  the  signals  and  variables  that  will  and  will  not  occur  in  the  modifies  list: 


ENTITY  modif ies_example  IS 
PORT  (  A  :  IN  integer; 

B  :  OUT  real; 

C,  D  :  BUFFER  bit; 
E,  F  :  INOUT  bit  ); 
STATE  G  :  integer; 
MODIFIES  C,  E; 

END  modif ies_example; 


The  list  of  signals/ variables  this  entity  will  modify  is  C,  E,  B  and  G.  C  and  E  are  included  in  this  list 
because  they  are  explicitely  stated  in  the  modifies  clause.  B  is  included  because  it  is  an  output 
signal.  All  architectures  of  an  entity  must  assign  a  value  to  all  entity  output  signals.  Thus,  B 
is  automatically  included  in  the  modifies  list.  G  is  included  in  this  list  for  a  similar  reason.  The 

2Global  variables  were  added  to  the  1993  version  of  VHDL.  Previous  definitions  of  the  language  did  not  contain 
global  variables. 
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definition  of  vspec  forces  the  entity  to  assign  a  value  to  all  state  variables  so  all  state  variables  are 
automatically  included  in  the  modifies  list. 

3.6  Includes  Clause 

{ includes-clause )  ::=  INCLUDES  (string -literal Jist)  ; 

The  includes  clause  is  used  to  include  a  Refine  program  in  a  vspec  specification.  This  Refine 
program  defines  the  functions  and  types  used  in  the  specification  and  it  helps  map  the  vspec 
specification  to  its  formal  representation  in  the  Refine  object  base.  A  vspec  specification  may 
contain  as  many  includes  clauses  as  the  user  needs  to  describe  the  system.  We  have  already  seen 
an  example  of  the  includes  clause  in  the  search  entity  described  in  Section  3.1: 


ENTITY  search  IS 

PORT  (input  :  IN  ARRAY  OF  INTEGER; 
key  :  IN  INTEGER; 
output  :  OUT  ARRAY  OF  INTEGER) 
REQUIRES  s or t ed( input ) ; 

INCLUDES  "sort. re",  "set. re"; 

END  search; 


In  this  example,  the  file  “sort .re”  contains  the  following  Refine  definition  of  the  sorted  function: 


function  sorted  (  input-seq  :  seq( integer)  )  :  boolean  = 
if  (size  (input-seq)  =  1)  then 
true 
else 

(  input-seq(l)  <  input-seq(2)  )  and  sorted  (rest (input-seq)) 


This  is  a  boolean  function  that  returns  true  when  the  input  sequence  is  in  order  from  smallest  to 
largest.  In  formal  logic,  a  boolean  function  is  called  a  predicate.  VSPEC  users  can  define  arbitrarily 
many  predicates  that  are  used  to  describe  the  observable  behaviors  of  the  system  being  designed. 
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Each  of  these  predicates  can  appear  in  the  requires  or  ensures  clauses  to  describe  a  functional 
requirement  of  the  system.  All  of  the  predicates  that  appear  in  these  clauses  must  be  defined  in  a 
Refine  file  that  is  listed  in  one  of  the  includes  clauses  in  the  entity  where  it  is  used. 

3.7  Based  On  Clause 

(based-on-clause)  ::=  { vspecJype )  BASED  ON  (refinesort) 

The  based  on  clause  is  used  to  map  a  data  type  used  in  vspec  to  its  definition  in  Refine.  This 
definition  in  REFINE  is  called  a  sort.  In  the  syntax  above,  vspec  type  is  an  identifier  that  refers  to 
the  data  type  used  in  vspec  and  refine  sort  is  an  identifier  that  represents  the  corresponding  sort 
in  Refine. 

The  vspec  system  provides  a  built  in  mapping  to  Refine  for  all  predefined  types  in  vhdl.  This  is 
accomplished  by  automatically  including  based  on  clauses  for  these  predefined  types  in  all  vspec 
entities.  The  vhdl  types  integer,  real,  boolean,  character  and  string  map  to  their  corre¬ 
sponding  types  in  Refine.  The  vhdl  types  severity  .level ,  bit  and  bit.vector  map  to  the 
following  definitions  in  Refine: 


type  severityJLevel  =  {’note,  'warning,  'error,  ’failure} 

type  bit  =  {0,  1} 

type  bit.vector  *  seq(bit) 


This  means  that  the  vspec  systems  adds  based  on  clauses  such  as  integer  BASED  ON  integer, 
character  BASED  ON  char  and  bit.vector  BASED  ON  bit.vector  to  all  VSPEC  entities.  In  addi¬ 
tion,  vspec  automatically  includes  a  Refine  file  that  contains  the  three  types  above.  With  these 
clauses  included  in  all  vspec  entities,  the  predefined  types  in  vhdl  may  be  used  in  any  vspec 
specification. 
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4  Formal  Representation  of  vspec 

All  vspec  definitions  can  be  transformed  into  a  formal  definition.  This  formal  definition  is  based 
on  an  extension  of  domain  theories  defined  in  the  cypress  [19]  and  kids  [21,  20]  systems,  cypress 
and  kids  are  software  synthesis  systems  that  can  be  used  to  synthesize  an  efficient  executable 
program  from  an  algebraic  specification.  A  domain  theory  is  used  to  describe  the  problem  to  be 
synthesized.  It  consists  of  a  tuple  of  the  domain  ( D ),  range  ( R ),  input  pre-condition  ( I(x  :  £>)) 
and  output  post-condition  (0(x  :  D,z  :  R))  commonly  referred  to  as  a  DRIO  model.  In  vspec, 
the  DRIO  model  can  be  constructed  using  the  following  rules: 

D  =  di  x  d,2  x  . . .  x  dn  where  each  dk  is  the  sort  (defined  by  the  based  on  clause)  representing  the 
type  associated  with  an  in,  inout,  or  buffer  port  or  a  state  variable 

R  =  n  X  r<z  x  . . .  x  rm  where  each  rj  is  the  sort  representing  the  type  associated  with  an  element 
in  the  modifies  list  (see  Section  3.5) 

I(x  :  D)  =  Iv(x  :  D)  where  Iv(x  :  D)  is  the  logical  sentence  defined  by  the  requires  clause 

0(x  :  D,z :  R)  =  Ov{x  :  D,z  :  R)  where  Ov(x  :  D,z  :  R)  is  the  logical  sentence  defined  by  the 
ensures  clause 

vspec  is  somewhat  different  from  the  specification  languages  that  are  normally  used  with  cypress 
and  kids.  A  specification  language  for  digital  systems  must  provide  a  means  for  describing  the 
performance  constraints  of  the  system.  One  way  to  do  this  would  be  to  include  these  types  of 
constraints  in  the  output  post-condition  for  the  system.  However,  this  is  not  the  approach  taken 
with  vspec.  Performance  constraints  have  nothing  to  do  with  the  function  of  the  system  so  we  feel 
it  is  appropriate  to  separate  them  from  the  functional  requirements  defined  in  the  post-condition 
of  the  system  (i.e.  the  ensures  clause). 
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This  is  one  reason  the  constrained  by  clause  is  included  in  vspec.  The  system’s  performance 
constraints  are  specified  in  the  constrained  by  clause  while  the  ensures  clause  describes  the 
functional  requirements  of  the  system.  The  performance  constraints  can  be  represented  in  the 
formal  model  of  VSPEC  by  extending  the  DRIO  to  a  DRIOC  model: 

C{c\  :  Ci, . . . ,  cn  :  Cn)  =  Cv(c\  :  C\, . . . ,  cn  :  Cn)  where  Ck  is  a  constraint  variable  such  as  heat  or 
area,  Ck.  is  a  sort  associated  with  a  constraint  variable  and  Cv  is  the  logical  expression  defined 
in  the  constrained  by  clause 

The  definitions  in  the  DRIOCdescribe  the  system  as  a  transformation  mapping  the  current  state 
and  inputs  into  the  next  state  and  outputs  such  that  when  the  input  pre-condition  is  satisfied  the 
output  post-condition  and  constraints  are  also  satisfied.  Formally,  this  can  written  as: 

V*  :  D  •  I{x)  =»  0(x,  /(*))  A  C(cx,  ...,cn)  (1) 

where  f(x )  is  the  transformation  performed  by  the  system.  This  axiom  shows  the  relationship 
between  the  design,  f(x ),  and  its  requirements.  In  vspec,  I(x)  is  derived  from  the  requires 
clause,  0(x,z)  from  the  ensures  clause  and  C(c1? . . . ,  cn)  from  the  constrained  by  clause.  In 
VSPEC  f(x )  will  be  defined  using  behavioral  vhdl.  Finding  f(x)  given  /,  0  and  C  is  the  synthesis 
problem  addressed  by  COMET.  Proving  the  equation  above  is  true  for  a  given  f(x),  /,  0  and  C 
verifies  that  f(x)  is  an  implementation  of  the  VSPEC  specification. 
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5  Extended  Example:  16-bit  Move  Machine 

5.1  Problem  Description 

The  Move  Machine  is  a  simple  microprocessor  whose  instructions  move  data  between  CPU  registers 
and  main  memory  [18].  The  computational  units  of  the  machine  are  assumed  to  be  memory  mapped. 
With  this  assumption,  arithmetic  and  logical  computations  are  performed  as  side  effects  of  moving 
data  to  and  from  designated  memory  locations. 

5.1.1  Physical  Configuration 

The  physical  storage  components  of  the  Move  Machine  are  a  main  memory  array  and  a  set  of 
registers.  The  registers  consist  of  an  instruction  pointer,  an  instruction  register,  and  an  array  of 
general  purpose  registers. 

In  this  example,  a  16-bit  Move  Machine  is  specified.  The  configuration  used  has  16  general  purpose 
registers,  each  16  bits  long.  The  main  memory  size  is  512  bytes  (256  16-bit  values),  requiring  8-bit 
addressing.  The  instruction  pointer  is  8  bits  and  the  instruction  register  is  16  bits. 

5.1.2  Instruction  Format 

The  instructions  of  the  16-bit  Move  Machine  have  four  fields: 

•  A  two  bit  op-code.  The  four  operations  that  the  Move  Machine  has  are:  load,  store,  jump, 
and  halt. 

•  A  two  bit  addressing  mode  which  determines  how  the  effective  address  is  specified  in  the 
instruction.  The  four  addressing  modes  are:  absolute,  immediate,  indirect,  relative. 

•  A  four  bit  register  identification  to  specify  which  register  is  to  take  part  in  the  operation. 
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•  An  eight  bit  effective  address  which,  in  conjunction  with  the  addressing  mode,  determines 
which  memory  location  takes  part  in  the  instruction. 

5.1.3  Processor  Operation 

The  1/ 0  interface  to  the  Move  Machine  consists  of  a  start  signal,  a  finished  signal  and  a  clear  signal. 
When  the  start  signal  is  received,  the  processing  cycle  begins.  When  the  machine  halts  (executes 
a  halt  instruction),  the  finished  signal  is  set.  The  clear  signal  resets  the  machine  and  prepares  it 
to  receive  the  start  signal. 

The  Move  Machine  has  a  three  phase  processing  cycle.  In  the  first  phase,  the  instruction  referenced 
by  the  instruction  pointer  is  fetched  from  memory.  In  the  second  phase,  the  effective  address  is 
calculated  according  to  the  specified  addressing  mode  and  the  instruction  pointer  is  incremented 
to  reference  the  next  instruction.  In  the  third  phase,  the  fetched  instruction  is  executed. 

5.2  Specification  of  the  Move  Machine 

The  first  step  in  specifying  the  behavior  of  the  Move  Machine  is  to  define  abstract  data  types 
in  Refine.  These  types  and  there  associated  operations  will  provide  the  vocabulary  necessary 
to  describe  the  behavior  of  the  Move  Machine.  Once  this  foundation  is  laid,  defining  the  vspec 
interface  specification  can  begin.  First,  the  input,  output,  and  state  variables  are  specified.  Then 
the  desired  behavior  is  described  using  the  appropriate  VSPEC  clauses. 

5.2.1  Abstract  Types  and  Operations 

Abstract  data  types  and  operations  are  specified  using  the  REFINE  language.  Refine  supports 
a  host  of  set  theoretic  data  types,  such  as  sets,  sequences,  tuples,  and  maps.  Sets  and  sequences 
represent  unordered  and  ordered  collections  of  objects,  respectively.  Tuples  are  an  ordered  collection 
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of  related  data,  similar  to  a  VHDL  record.  Maps  represent  a  functional  relation  between  two  types. 
Formally,  they  are  a  set  of  2-tuples  such  that  M(x)  =  y  means  that  (x,y)  €  M.  Some  additional 
REFINE  constructs  will  be  introduced  as  they  are  used  in  the  example.  Refine  has  a  complete 
array  of  operations  for  the  predefined  data  types.  For  a  more  complete  explanation  of  Refine 
types  and  operations,  see  the  Refine  User’s  Manual  [17]. 

Figure  3  shows  the  Refine  specification  of  the  Move  Machine  data  types  and  operations.  The 
first  section  in  Figure  3  shows  the  predefined  vhdi  types  available  for  use  within  the  Refine 
specification.  These  are  shown  for  reference,  to  make  the  example  self-contained.  The  predefined 
vhdl  types  are  shown  in  all  caps  whenever  they  are  used.  The  next  section  in  Figure  3  is  a  group 
of  constant  declarations  that  define  the  hardware  configuration  of  the  Move  Machine. 

The  next  group  of  declarations  are  the  abstract  data  types.  First,  the  Word  type  is  introduced  as 
a  set  .of  BIT-VECTOR.  Next,  the  Address  type  is  defined  as  an  integer  subrange.  Variables  of  type 
Address  will  have  an  integer  value  between  0  and  MM_Size-l.  The  type  Memory-Array  is  defined  as 
a  map  from  Addresses  to  Words.  This  means  that  for  a  Memory-Array,  M,  and  an  Address,  x,  the 
Word  at  memory  location  x  is  simply  M  (x) .  Notice  that  the  size  of  a  Memory-Array  is  restricted 
by  the  upper  bound  of  the  Address  integer  range.  Register-Array  and  Register-Id  are  specified 
in  the  same  manner  as  Memory-Array  and  Address. 

The  abstract  type  Operation  is  defined  to  describe  the  four  possible  Move  Machine  operations. 
This  is  done  using  a  symbol.  Symbols  are  a  Refine  type  used  to  represent  an  abstract  value.  They 
are  not  strings  or  sequences  of  characters.  Each  symbol  literal  is  a  unique  atomic  value.  The  Move 
Machine’s  four  addressing  modes  are  similarly  represented  by  the  Add_Mode  type. 

The  Instruction  type  is  a  4-tuple  representing  the  four  fields  of  the  instruction.  The  tuple  values 
axe  accessed  in  the  same  manner  as  fields  of  a  record.  The  op.code  value  for  an  Instruction,  i, 
is  simply  i.  op  .code. 

The  last  data  type  specified  is  Proc_State.  This  type  is  used  to  represent  the  abstract  states  of  the 
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% —  REFINE  move.mc.types .re  —  abstract  type  lor  the  Move  Machine. 

X  The  following  lines  are  needed  in  all  Refine  programs 
!!  in-package ("RU") 

! !  in-grammar ( ’  ns  er ) 

X  predefined  VHDL  types  and  operations 

*/,  type  BIT  =  boolean 

*/.  type  BIT.VECTOR  =  seq(BIT) 

X  function  bits_to.int(b: BIT.VECTOR)  :  INTEGER 

X  Move  Machine  constant  declarations 
constant  MM.Size  :  INTEGER  =  256 
constant  Register.Array.Size  :  INTEGER  =  16 
constant  Word_Size:  INTEGER  =  16 

X  Move  Machine  type  declarations 

type  Word  =  BIT.VECTOR 

type  Address  =  {0 . .MM.Size-1}  X  integer  range 

type  Memory. Array  =  map (Address, Word) 

type  Registered  =  {0.  .Register _Array_Size-l}  X  integer  range 

type  Register.Array  =  map (Register.Id, Word) 

type  Operation  =  SYMBOL 

type  Add.Mode  =  SYMBOL 

type  Instruction  = 

tuple(op_code  :  Operation,  addr.mode  :  Add.Mode, 
reg.id  :  Registered,  eff.addr  :  Address) 

type  Proc.State  =  SYMBOL 

X  Operations  over  the  Move  Machine  types 

function  Word.to.Instr (data  :  Word)  :  Instruction  = 

<  Decode.0p(subseq(data,0, 1) ) , 

De code. AM ( subs eq (data, 2, 3) ) > 

bit s.to.int (subs eq (data, 4,7) ) , 
bits_to.int(subseq(data,8, 15))  > 

function  Decode.0p(data  :  seq(BIT))  :  Operation 

computed-using  data  =  [false, false]  =>  Decode.Op(data)  =  ’load, 
data  =  [false, true]  =>  Decode.Op(data)  =  ’store, 
data  =  [true, false]  =>  Decode.Op(data)  =  ’jump, 
data  =  [true, true]  =>  Decode.Op(data)  =  ’halt 

function  Decode.AM(data  :  seq(BIT))  :  Add.Mode 

computed-using  data  =  [f alse , false]  =>  Decode.AM(data)  =  ’absolute, 
data  =  [false, true]  =>  Decode.AM(data)  =  ’immediate, 
data  =  [true, false]  =>  Decode.AM(data)  =  ’indirect, 
data  =  [true, true]  =>  Decode.AM(data)  =  ’relative 


Figure  3:  Move  Machine  data  types  and  operations. 
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Move  Machine’s  operation.  The  three  processing  phases,  fetch,  decode,  and  execute,  are  represented 
along  with  start  and  stop  states.  The  allowable  actions  of  the  Move  Machine’s  behavior  will  be 
expressed  as  transitions  between  these  five  processor  states. 

The  last  section  in  Figure  3  is  the  specification  of  Word_to-Instr,  an  operation  that  converts 
between  Words  and  Instructions.  This  conversion  will  be  necessary  because  instructions  are 
stored  in  memory  as  words.  Notice  that  syntax  of  REFINE  permits  simply  equating  the  function 
with  a  tuple  construct.  The  values  of  each  of  the  tuple  fields  are  themselves  function  calls.  The 
REFINE  subseq  operation  is  used  to  extract  a  smaller  sequence  from  an  existing  sequence.  This 
operation  can  be  used  with  the  type  Word,  because  it  is  a  BIT-VECTOR  which  is  a  sequence  of  BITS. 
The  functions  Decode_0p  and  Decode_AM  are  used  to  precisely  define  the  operation  and  addressing 
mode  deciding  scheme. 

5.2.2  VSPEC  Interface  Specification 

This  section  contains  a  detailed  description  of  the  interface  specification  for  the  Move  Machine. 
The  entire  specification  is  shown  in  Figure  4.  We  will  describe  each  section  of  this  specification 
separately,  starting  with  the  port  declaration.  This  is  where  the  entity  move_mc  is  created  and  its 
I/O  ports  are  declared  in  standard  VHDL  syntax.  The  start  and  clear  signals  are  defined  as  inputs 
and  the  finished  signal  is  defined  as  an  output.  The  Move  Machine  port  declaration  is: 


entity  move.mc  is 

port  (Start:  in  BIT;  —  Begin  processing 

Clear:  in  BIT;  —  Restart  processing 

Finished:  out  BIT);  —  Processing  completed 


The  vspec  includes  clause  follows  the  port  declaration: 


includes  "move_mc_types . re" ; 
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entity  aovejac  is 
port  (Start:  in  BIT; 

Clear:  in  BIT; 
Finished:  out  BIT) ; 

includes  Mmo ve^mc. types. re H ; 

state 

phase:  Proc.State, 
Memory  :  Memory. Array, 

IP  :  Address, 

IR  :  Instruction, 

RGST  :  Register.Array, 
EA  :  Address, 


—  Begin  processing 

—  Restart  processing 

—  Processing  completed 


—  Abstract  Processor  State 

—  Main  Memory 

—  Instruction  Pointer 

—  Instruction  Register 

—  General  Purpose  Registers 

—  Effective  Address 


ensures 

phase  =  start  implies  (Start  =  *1’  implies  phase ’post  =  fetch) 
and  (Start  =  ’0’  implies  phase ’post  =  start) 
and  IP’ post  =*  0 

and  Memory’ post  =»  Memory  and  RGST’post  =  RGST 

and 

phase  =■  fetch  implies  IR’post  -  tford.to.Instr (Memory ( IP) ) 
and  phase ’post  =*  decode  and  Memory’post  =  Memory 
and  RGST’post  =  RGST  and  IP ’post  =  IP 

and 

phase  »  decode  implies  phase ’post  =  execute 
and  (IR.addr.mode  =*  absolute  implies 

EA’post  =*  IR.eff.addr  and  IP ’post  =  IP  +  1) 
and  (IR.addr.mode  =  immediate  implies 

EA’post  =  IP  +  1  and  IP ’post  »  IP  +  2) 
and  (IR.addr.mode  »  indirect  implies 

EA’post  =  Word_to_Instr(Memory(IR.eff_addr)) .eff.addr 
and  IP ’post  =»  IP  +  1) 
and  (IR. addr.mode  «  relative  implies 

EA’post  =*  IP  +  IR. eff.addr  and  IP ’post  =  IP  +  1) 
and  Memory’post  =*  Memory  and  RGST’post  =  RGST  and  IR’post  =  IR 

and 

phase  =»  execute  implies 

(IR. operation  =  load  implies  RGST(IR.reg.id) ’post  «  Memory(EA) 
and  forall(x:Register_Id) 

(x  /=»  IR.reg.id  implies  RGST(x)’post  ®  RGST(x)) 
and  (IR. operation  /*  load  implies  RGST’post  -  RGST) 
and  (IR. operation  =  store  implies  Memory (EA) ’post  =  RGST(IR.reg.id) ) 
and  forall(x: Address)  (x  /=  EA  implies  Memory (x) ’post  =  Memory(x)) 
and  (IR. operation  /=  store  implies  Memory’post  =  Memory) 
and  (IR. operation  =  jump  implies  IP ’post  =  EA) 
and  (IR. operation  /-  jump  implies  IP ’post  -  IP) 
and  (IR. operation  *  halt  implies  phase ’post  =  stop) 
and  (IR. operation  /=  halt  implies  phase ’post  =  fetch)) 

and 

phase  *  stop  implies  Finished ’post  =  ’1’ 

and  (Clear  -  ’0’  implies  phase ’post  =  stop) 
and  (Clear  =  ’1’  implies  phase ’post  =  start) 
and  Memory’post  ■  Memory  and  RGST’post  =  RGST 

and 

phase  /=  stop  implies  Finished ’post  =  ’O’; 
end  move.mc; 


Figure  4:  vspec  interface  specification  for  the  Move  Machine 
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The  includes  clause  states  that  this  specification  will  use  abstract  types  and  operations  defined 
in  the  file  move_mc.types  .re,  which  was  described  in  the  previous  section. 

The  behavior  of  the  Move  Machine  is  specified  by  describing  the  allowable  transactions  between 
processor  states  [14].  To  do  this,  we  must  first  definine  the  information  that  determines  the  processor 
state.  The  Move  Machine  has  a  three  phase  processing  cycle  which  can  be  viewed  as  processor  states. 
The  addition  of  a  start  and  a  stop  state  defines  a  set  of  states  which  uniquely  describes  the  status  of 
the  Move  Machine  at  any  moment  in  time.  The  abstract  type  Proc_State  was  defined  specifically 
for  this  purpose.  Therefore,  the  state  clause  contains  the  variable  phase  of  type  Proc_State  to 
model  the  processor  state: 


state 

phase:  Proc_State, 
Memory  :  Memory. Array , 
IP  :  Address, 

IR  :  Instruction, 

RGST  :  Register.Array , 
EA  :  Address, 


—  Abstract  Processor  State 

—  Main  Memory 

—  Instruction  Pointer 

—  Instruction  Register 

—  General  Purpose  Registers 

—  Effective  Address 


Naturally,  the  values  of  the  registers  and  main  memory  are  of  interest  when  observing  the  behavior 
of  the  processor.  Variables  of  these  type  are  declared  in  the  state  clause  to  model  these  physical 
structures.  In  addition,  any  internal  signals  that  are  used  to  communicate  between  processor  states 
must  be  declared  as  state  variables.  The  effective  address  is  calculated  in  the  decode  phase  but  it 
is  used  in  the  execute  phase.  Therefore,  the  variable  EA  of  type  Address  is  declared  to  store  the 
effective  address  between  states. 

Given  a  set  of  input  and  state  variables,  the  vs  PEC  ensures  clause  can  be  used  to  specifiy  the 
allowable  changes  to  the  output  and  state  variables.  In  this  way,  the  behavior  of  the  Move  Machine 
is  defined.  The  Move  Machine  ensures  clause  is  structured  according  to  the  value  of  the  phase 
variable.  This  clarifies  the  specification  of  the  state  transactions  that  are  allowed  during  each  phase 
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of  processor  execution.  The  allowable  transactions  for  each  phase  are  then  conjuncted  together  to 
provide  a  complete  behavioral  specification. 

The  permissible  next  state  values  must  be  explicitly  constrained  for  each  state  variable.  If  a  state 
variable  is  not  constrained,  then  presumably  it  is  allowed  to  take  on  any  value  of  the  associated 
type.  It  is  not  assumed  that  unconstrained  variables  remain  unchanged.  Constraining  a  variable’s 
behavior  is  accomplished  using  the  vspec  implies  operator  to  define  the  next  state  values  that 
are  possible  during  each  processor  phase.  In  this  example,  the  next  state  values  are  determinant, 
but  this  is  not  a  necessary  condition.  Non-determinism  can  be  modeled  by  disjuncting  allowable 
next  state  values. 

The  first  part  of  the  ensures  clause  specifies  what  transactions  are  allowed  during  the  start  phase. 
While  in  the  start  phase,  the  processor  is  simply  waiting  for  the  start  signal  to  begin  processing.  If 
the  processor  does  not  receive  the  start  signal,  it  stays  in  the  start  phase.  This  constraint  on  the 
next  state  value  of  the  phase  variable  (phase’post)  is  specified  by  the  first  two  conjuncts  implied 
by  the  start  phase.  Note  that  the  notation  <variaWe>’ post,  where  <variable>  is  the  identfier  for 
the  variable,  is  used  to  refer  to  the  value  of  the  variable  after  the  transaction  occurs.  Here  is  the 
part  of  the  ensures  clause  which  describes  the  start  phase: 


phase  =  start  implies  (Start  =  *  1’  implies  phase’post  =  fetch) 
and  (Start  =  ’0’  implies  phase’post  =  start) 
and  IP ’post  =  0 

and  Memory ’post  =  Memory  and  RGST’post  =  RGST 

The  conjunct,  IP  ’post  =  0,  states  that  the  first  instruction  will  be  retrieved  from  memory  location 
0.  The  final  two  conjuncts  specify  that  the  main  memory  and  register  values  must  remain  unchanged 
during  this  processing  phase.  Without  these  constraints,  the  specification  would  be  satisfied  by  an 
implementation  where  the  memory  and  registers  values  arbitrarily  change  during  this  state,  which 
is  not  the  desired  behavior.  Notice  that  the  state  variable  EA  is  not  constrained  during  this  phase. 
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At  this  point,  the  EA  variable  does  not  contain  any  information  which  will  effect  the  future  state 
of  the  machine.  Therefore,  the  specification  need  not  be  constrained  to  retain  the  value  of  this 
variable. 

The  Move  Machine  behavior  during  the  fetch  phase  is  described  by: 

phase  =  fetch  implies  IR’post  =  Word_to_Instr (Memory (IP)) 
and  phasa’post  =  decode  and  Memory ’post  =  Memory 
and  RGST ’post  =  RGST  and  IP’ post  =  IP 

During  the  fetch  phase,  the  instruction  pointer  is  updated  to  ontain  the  interpretation  of  the  word  at 
memory  location  IP.  Here,  the  interpretation  is  performed  by  the  Word_to_Instruction  function 
defined  in  the  previous  section.  The  next  processing  phase  is  specified  to  be  decode,  while  the 
memory  and  remaining  register  values  remain  unchanged. 

The  state  changes  which  occur  during  the  decode  phase  hinge  on  the  addressing  mode.  Therefore, 
the  majority  of  the  specification  of  the  decode  phase  is  structured  around  the  value  of  IR.  addr_mode: 


phase  =  decode  implies  phase’post  =  execute 
and  (IR.addr.mode  =  absolute  implies 

EA’post  =  IR.eff_addr  and  IP ’post  =  IP  +  1) 
and  (IR.addr.mode  =  immediate  implies 

EA’post  =  IP  +  1  and  IP’post  =  IP  +  2) 
and  (IR.addr.mode  -  indirect  implies 

EA’post  =  Word_to_Instr(Memory(IR.eff _addr))  .eff.addr 
and  IP’post  =  IP  +  i) 
and  (IR.addr.mode  =  relative  implies 

EA’post  =  IP  +  IR. eff.addr  and  IP’post  *  IP  +  1) 
and  Memory’post  =  Memory  and  RGST’post  =  RGST  and  IR’post  =  IR 


The  effective  address,  EA  and  instruction  pointer,  IP,  are  updated  according  to  the  current  ad¬ 
dressing  mode.  The  next  phase  is  specified  to  be  the  execute  phase.  The  main  memory,  the  CPU 
registers  and  the  instruction  register  are  unchanged. 
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The  Move  Machine  behavior  during  the  execution  phase  depends  upon  the  fetched  operation.  This 
part  of  the  specification  is  determined  by  the  Move  Machine  operations: 


phase  *  execute  implies 

(IR. operation  =  load  implies  RGST(IR.reg_id) ’post  =  Memory(EA) 
and  forall(x:Register_Id) 

(x  /=  IR.reg.id  implies  RGST(x)’post  =  RGST(x)) 
and  (IR. operation  /=  load  implies  RGST’post  =  RGST) 
and  (IR. operation  -  store  implies  Memory (EA) ’post  =  RGST(IR.reg_id)) 
and  forall(x:Address) (x  /=  EA  implies  Memory (x) ’post  =  Memory(x)) 
and  (IR. operation  /=  store  implies  Memory’post  =  Memory) 
and  (IR. operation  =  jump  implies  IP ’post  =  EA) 
and  (IR. operation  /=  jump  implies  IP’post  =  IP) 
and  (IR. operation  =  halt  implies  phase’post  =  stop) 
and  (IR. operation  /=  halt  implies  phase’post  =  fetch)) 


For  a  load  operation,  the  register  identified  by  the  current  instruction  is  assigned  the  value  of  the 
memory  location  referenced  by  the  effective  address.  This  is  easily  specified  by:  RGST(IR.reg-id)  post 
=  Memory  (EA).  However,  it  is  also  necessary  to  specify  that  the  remaining  registers  do  not  change. 
This  is  the  purpose  of  the  second  conjunct  implied  by  the  load  operation.  Using  the  vs  PEC  for  all 
construct,  it  states  that  every  register  that  is  not  involved  in  the  load  operation  retains  its  value. 
When  the  instruction  does  not  specify  a  load  operation,  the  values  of  the  register  array  do  not 
change. 

Similarly,  for  a  store  operation,  the  specification  states  that  the  specified  memory  location  changes 
while  the  rest  remain  unchanged.  The  jump  operation  only  effects  the  value  of  the  instruction 
pointer.  A  halt  operation  causes  the  next  phase  to  be  the  stop  phase.  Any  other  operation  results 
in  the  processing  returning  to  the  fetch  phase. 

During  the  stop  phase,  the  processor  sets  the  finished  signal  and  monitors  the  clear  signal.  The 
stop  phase  is  specified  by: 
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phase  =  stop  implies  Finished’ post  =  ’1’ 

and  (Clear  =  ’0’  implies  phase’post  -  stop) 
and  (Clear  =  ’1’  implies  phase’post  =  start) 
and  Memory ’post  =  Memory  and  RGST’post  =  RGST 

and 

phase  /-  stop  implies  Finished’post  =  ’O’; 


The  next  phase  is  determined  by  the  clear  signal.  This  part  of  the  specification  also  constrains  the 
finished  signal  to  be  low  during  every  other  phase. 

The  full  behavior  of  the  Move  Machine  is  modeled  by  conjuncting  the  specifications  for  the  indi¬ 
vidual  phases.  Figure  4  shows  the  entire  specification  for  the  Move  Machine. 


6  Related  Work 

vs  pec  uses  an  axiomatic  specification  technique  based  on  the  approach  developed  for  the  Larch  [7] 
family  of  specification  langauges.  On  the  surface,  VSPEC  is  a  prototype  Larch  interface  language 
for  vhdl.  Thus,  many  of  its  constructs  can  also  be  found  in  other  Larch  interface  languages,  most 
specifically  LM3  [12],  an  interface  language  for  Modula-3.  Currently,  VSPEC  is  not  a  true  interface 
language  as  its  semantics  are  defined  using  REFINE  rather  than  the  Larch  Shared  Language  (lsl). 
However,  the  general  concept  of  a  language  specific  axiomatic  interface  language  in  combination 
with  a  means  for  writing  auxiliary  specification  is  prominent. 

Odyssey  Research  Associates  (ORA)  is  developing  a  Larch  interface  language  for  VHDL  [11].  This 
language  differs  from  vspec  because  it  is  targeted  for  formal  analysis  of  the  system  rather  than 
for  synthesis.  ORA  is  attempting  to  generate  a  formal  semantics  for  vhdl  using  LSL  for  prov¬ 
ing  correctness.  This  approach  is  adopted  from  the  Ada  work  previously  done  in  the  Penelope 
project  [4].  In  ORA’s  interface  language,  time  is  the  only  non-functional  constraint  directly  rep¬ 
resented.  Rather  than  placing  constraints  on  pin-to-pin  timing,  an  absolute  time  based  termporal 
logic  is  used  to  specify  the  an  entity’s  function.  One  can  specify  that  a  predicate  P(x )  must  be  true 
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at  time  t  using  the  notation  .  Thus,  a  system’s  timing  constraints  are  intermingled  in  the 

definition  of  the  function  of  the  system.  The  vspec  notation  specifies  time  intervals  as  constraints 
independent  of  system  function.  In  principle,  separation  of  concerns  is  a  design  goal  for  any  spec¬ 
ification  language.  In  practice,  including  temporal  aspects  in  the  functional  specification  requires 
use  of  theorem  provers  capable  of  temporal  reasoning.  Currently,  there  are  few  such  production 
quality  provers.  In  VSPEC,  information  needed  for  constraint  verification  is  included,  but  one  may 
choose  characteristics  for  verification. 

VAL  [1]  is  another  attempt  to  annotate  vhdl.  VAL  (VHDL  Annotation  Language  is  based  on 
similar  work  done  with  Anna  for  Ada  programs  [15].  VAL  differs  from  VSPEC  because  it  is  an 
annotation  of  a  specific  vhdl  design  rather  than  a  representation  of  the  requirements  for  a  system 
not  yet  designed.  VSPEC  clauses  may  access  only  ports  defined  by  the  entity  and  variables 
defined  locally  in  the  specification.  VAL  annotations  exist  throughout  the  vhdl  specification 
and  formally  document  its  behavior.  Any  local  variable  may  be  referenced  in  a  VAL  annotation. 
Specific  aspects  of  both  the  structural  and  behavioral  implementation  are  documented  in  the  VAL 
annotation.  VAL’s  intent  is  to  document  a  design  for  verification  where  vspec’s  intent  is  to  define 
requirements  for  a  system. 


7  Current  Status  and  Future  Directions 

Current  vspec  research  involves  pursuing  domain  specific  support  for  prototype  synthesis.  The 
role  of  VSPEC  in  the  comet  system  is  as  a  requirements  specification  language  and  as  input  to 
synthesis  tools.  Thus,  we  are  working  to  develop  techniques  to  transform  vspec  into  behavioral 
and  structural  vhdl.  An  important  related  technology  transfer  issue  is  developing  a  handbook  of 
reusable  specifications.  In  the  Larch  tradition,  a  handbook  is  simply  a  collection  of  reusable  theories 
defined  in  the  shared  language.  Handbook  theories  represent  commonly  used  structures,  algorithms 
and  characteristics  a s  well  as  domain  specific  information.  For  vhdl  theories  representing  standard 
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vhdl  types,  low  level  logic  functions,  signal  attributes  and  conversion  routines  are  some  libraries 
currently  being  implemented.  Theories  for  pin-to-pin  timing,  heat  dissipation,  power  consumption, 
area  and  clock  speed  have  been  implemented  to  support  constraint  checking  during  the  design 
process. 

We  are  beginning  an  effort  to  make  vspec  a  true  Larch  interface  language.  Specifically,  defining  each 
of  its  constructs  using  LSL  and  developing  tools  for  manipulating  the  specifications.  Of  particular 
interest  is  the  representation  of  parallel  components.  Each  entity  structure  exists  asynchronously 
in  parallel  with  other  entities  in  the  same  design,  representing  such  parallelism  in  vspec  is  a 
current  area  of  research. 

A  prototype  vspec  parser  has  been  developed  and  will  be  used  to  drive  synthesis  tools  and  the 
translation  from  vspec  to  lsl.  The  parser  is  developed  using  the  Software  Refinery’s  Dialect 
tool  and  parses  vhdl93  with  vspec  extensions  into  an  abstract  syntax  tree.  This  data  structure 
serves  as  the  basis  for  interfacing  VSPEC  with  other  tools. 
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Abstract 

This  paper  discusses  a  scheduling  technique  for 
pipelined  hardware-software  codesigns.  The  technique 
uses  scheduling  and  retiming  to  optimize  the  perfor¬ 
mance  of  a  given  codesign.  The  paper  presents  heuris¬ 
tics  for  scheduling  and  retiming  which  aim  to  optimize 
the  throughput  and  memory  requirements  of  a  given 
codesign.  The  effectiveness  of  the  technique  is  demon¬ 
strated  by  experimentation. 

1  Introduction 

Hardware-Software  codesigns  are  characterized  by 
strict  performance  constraints.  The  codesign  process 
partitions  the  system  specification  in  to  interacting 
hardware  (HW)  and  software  (SW)  tasks  which  ex¬ 
hibit  the  desired  behavior  and  satisfy  the  performance 
requirements.  In  a  typical  codesign  flow  the  HW-SW 
partitioner  and  the  scheduler  execute  in  an  iterative 
fashion  till  a  constraint  satisfying  design  is  obtained. 
Many  digital  signal  processing  (DSP)  algorithms  are 
loop  oriented,  which  makes  them  suitable  for  pipelined 
codesign  implementation.  In  this  paper  we  present  a 
technique  which  optimizes  the  throughput  and  mem¬ 
ory  requirements  of  pipelined  codesigns  by  scheduling 
and  retiming. 

The  system  specification  is  captured  in  an  interme¬ 
diate  graph  format  called  the  Data  Dependency  Graph 
( DDG ).  The  vertices  of  the  graph  represent  the  tasks 
and  the  edges  represent  the  data  dependencies  among 
the  various  tasks.  The  granularity  of  the  tasks  is  deter¬ 
mined  by  the  user.  The  execution  times  of  the  tasks  on 
the  SW  processor  and  in  HW  are  obtained  by  profiling 
and  HW  performance  estimation  respectively  [5];  and 
are  stored  in  the  graph  representation.  The  edges  con- 

*This  work  was  partially  supported  by  the  ARPA  RASSP 
program  and  monitored  by  the  Wright  Lab,  US-AF  under  con¬ 
tract  number  F33615-93-C-1316  and  ARPA  HPCC  program 
monitored  by  the  FBI  under  contract  number  J-FBI-93-116 


tain  information  about  the  number  of  variables  across 
a  dependence.  The  DDG  representation  will  be  dis¬ 
cussed  in  detail  in  Section  3. 

The  codesign  architecture  consists  of  a  single  gen¬ 
eral  purpose  SW  processor,  a  single  application  spe¬ 
cific  integrated  chip  (ASIC)  and  a  shared  memory 
(Figure  1).  The  SW  processor  and  ASIC  are  connected 
to  the  shared  memory  through  the  system  bus.  The 
general  purpose  processor  and  the  ASIC  themselves 
axe  non-pipelined  with  respect  to  task  execution,  that 
is  a  new  task  cannot  begin  execution  before  the  pre¬ 
vious  one  has  finished.  Communication  between  tasks 
bound  to  different  resources  (that  is  from  SW  to  HW 
or  HW  to  SW)  takes  place  through  the  shared  mem¬ 
ory.  Also  data  transfers  between  two  tasks  bound  to 
ASIC  takes  place  through  the  shared  memory.  The 
shared  memory  is  exclusive  read  exclusive  write  and 
therefore  no  two  tasks  can  either  read  or  write  at  the 
same  time. 


Figure  1:  Codesign  Architecture 


The  throughput  of  loop-oriented  codesigns  can  be 
maximized  by  obtaining  a  pipelined  implementation. 
The  drawback  of  pipelining  is  that  it  increases  the 
memory  requirement  of  the  design.  Consider  the  DDG 
shown  in  Figure  2.  It  consists  of  three  tasks  shown 
as  bubbles  in  the  figure.  The  binding  and  execution 
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DATA  DEPENDENCY  GRAPH 


Memory  Read  Time  =  1  t-units  per  data  Item 
Memory  Write  Time  =  1  t-units  per  data  item 

Figure  2:  DDG  Example 


times  of  the  tasks  are  shown  beside  each  bubble.  The 
data  dependencies  are  shown  as  directed  edges  and  the 
data  items  transferred  by  each  dependency  are  writ¬ 
ten  next  to  the  edges.  The  memory  read  and  write 
times  are  also  shown  in  the  figure.  We  assume  that 
the  DDG  is  executed  a  number  of  times  inside  a  loop. 
The  non-pipeline  and  pipeline  implementations  of  the 
design  are  shown  in  Figure  3.  The  rectangles  in  Figure 
3  represent  the  execution  of  various  tasks.  Each  rect¬ 
angle  contains  the  task  number  and  iteration  number 
of  the  loop  to  which  it  belongs.  The  small  rectangles 
with  “r”  and  “w”  represent  memory  read  and  write 
respectively.  We  assume  that  a  task  while  executing 
needs  memory  space  for  both  its  read  set  and  write 
set.  The  read  (write)  set  of  a  task  is  the  set  of  data 
items  read  (written)  by  the  task.  As  can  be  seen  from 
the  figure  the  non-pipeline  implementation  takes  374 
t-units  to  complete  one  iteration  of  the  loop  and  it  re¬ 
quires  12  memory  units.  The  pipeline  implementation 
overlaps  the  execution  of  tasks  belonging  to  different 
iterations  of  the  loop.  When  fully  loaded  the  steady 
state  completes  one  iteration  in  269  t-units,  a  definite 
improvement  on  the  previous  design.  But  it  requires 
17  memory  units  for  its  execution. 

The  paper  presents  a  technique  for  optimizing  the 
performance  of  pipelined  codesign.  The  technique  uses 
a  list  based  scheduler  [1]  and  retiming  transformations 
[2]  to  obtain  a  pipelined  codesign.  The  paper  presents 
heuristics  for  both  scheduling  and  retiming  which  try 
to  maximize  the  throughput  of  the  design  while  trying 
to  minimize  the  memory  requirements. 

The  paper  is  organized  as  follows.  In  Section  2  we 
discuss  previous  work,  in  Section  3  we  describe  the 
DDG  representation,  Section  4  presents  the  pipeline 
scheduling  technique,  the  experimental  results  are  in 
Section  5  and  finally  Section  6  concludes  the  paper. 
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Figure  3:  Non-pipeline  and  Pipeline  Implementation 


2  Previous  Work 

Based  on  their  application  area  existing  codesign 
methodologies  can  be  broadly  classified  in  to  two  cate¬ 
gories.  Category  one  would  include  methodologies  ori¬ 
ented  towards  real  time  reactive  systems  [7]  [8]  [11]  [13]. 
Scheduling  in  reactive  systems  is  done  to  ensure  that 
time  constraints  and  data  dependencies  between  dif¬ 
ferent  processes  are  satisfied  [12].  Category  two  would 
contain  methodologies  that  are  meant  for  data  pro¬ 
cessing  applications  [10].  Design  methodologies  for 
such  applications  use  scheduling  to  maximize  the 
throughput  of  a  given  codesign  partition.  Our  code¬ 
sign  flow  would  fall  into  category  two.  In  this  paper  we 
present  a  scheduling  heuristic  for  optimizing  through¬ 
put  and  memory  requirements  of  a  design.  Pipelining 
is  an  effective  way  for  maximizing  the  throughput  of 
a  loop  oriented  design.  Other  research  [9]  has  used 
pipelining  for  mixed  applications  which  include  both 
control  constructs  and  data  processing  tasks.  We  use 
retiming  [2]  to  generate  pipeline  designs.  The  for¬ 
malism  for  the  problem  description  and  the  general 
technique  is  described  in  [3]  and  we  use  the  same  in 
our  paper.  Retiming  heuristics  in  [3]  aim  at  obtaining 
pipelined  implementations  with  optimum  throughput. 
In  this  paper  we  present  a  scheduler  interacting  with 
a  retimer  to  optimize  both  throughput  and  memory 
requirements  of  pipelined  codesign  applications. 

3  Data  Dependency  Graph 

The  input  specification  is  captured  by  an  intermedi¬ 
ate  graph  called  the  Data  Dependency  Graph  (DDG). 
It  represents  the  tasks  by  vertices  and  the  data  de¬ 
pendencies  between  tasks  by  directed  edges.  The  ver¬ 
tices  have  information  about  the  task  binding  (HW  or 
SW),  HW  execution  time  and  SW  execution  time.  The 
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edges  have  information  about  the  number  of  variables 
in  a  dependence.  Since  we  are  interested  in  pipelining 
the  design,  we  associate  with  each  vertex  an  iteration 
index  (A)  and  with  each  edge  a  dependency  distance 
(< 5 )  [3].  The  iteration  index  A (u),  of  a  task  u  indicates 
that  at  the  iih  iteration  of  the  steady  state,  instance 
of  task  u  belonging  to  the  ( i  +  A(u))  iteration  of  the 
loop  is  executed.  For  example  consider  the  pipelined 
design  in  Figure  3.  In  the  first  iteration  of  the  steady 
state,  instance  of  task  1  belonging  to  the  second  itera¬ 
tion  of  the  loop  is  executed,  hence  X(taskl)  =  1.  The 
dependence  distance  of  an  edge  e,  (5(e)  indicates  the 
number  of  iterations  of  the  steady  state  traversed  by 
that  edge.  In  the  pipelined  implementation  in  Figure 
3,  the  data  produced  by  task  1  at  the  iih  iteration  of 
the  steady  state  is  consumed  by  task  2  at  the  (i-j-  l)t/l 
iteration  of  the  steady  state.  Hence  the  dependence 
distance  of  edge  (1, 2)  is  <5(1, 2)  =  1.  We  now  formalize 
the  DDG  representation  as  follows: 

A  DDG  is  a  4-tuple  DDG  ~  G(V,  E,  A,<5),  where: 

•  V  is  the  set  of  vertices.  Each  vertex  u  G  V  rep¬ 
resents  a  task.  For  each  task  u  €  V  we  have  the 
following  information  available  to  us  : 

—  Ubind  •  The  binding  of  the  task ,  that  is 
whether  its  going  to  be  implemented  in  HW 
or  SW . 

-  usw  :  The  SW  runtime  of  the  task  for  a 
particular  input  data  on  the  general  purpose 
processor. 

-  Uhw  ■  The  HW  runtime  of  the  task  if  it  were 
to  be  implemented  as  an  ASIC  for  the  same 
input  data. 

•  E  is  the  set  of  directed  edges.  Each  e  =  (u,  v)  £  E 
represents  a  data  dependence  between  tasks  u  and 
v.  Every  edge  has  information  about  the  number 
of  variables  (evar)  represented  by  the  dependence. 

•  A  and  5  are  two  mappings ,  A  :  V  ^  IN  and 
S  :  E  — >  IN,  representing  the  iteration  index  (X) 
and  the  number  of  iterations  traversed  by  the  de¬ 
pendence  (S).  IN  is  the  set  of  natural  numbers. 

Initially,  Vu  €  V ,  A(u)  =  0.  Notice  that  the  repre¬ 
sentation  has  no  control  flow  constructs;  it  is  strictly 
data  flow.  Now  we  explain  and  formalize  terms  and 
expressions  that  we  will  use  in  the  rest  of  the  paper. 

The  latency  of  a  task  u,  Lu,  is  the  total  execution 
time  of  the  task.  It  is  the  sum  of  the  task’s  read  time, 
execution  time  on  the  particular  resource  that  its  been 
bound  to  and  write  time.  The  read  (write)  time  of  a 
task  is  the  product  of  the  number  of  variables  read 


(written)  by  the  task  and  the  memory  read  (write) 
time.  Since  we  have  only  two  resources,  the  execution 
time  for  a  task  on  a  resource  is  usw  (if  Ubind  —  suf)  or 
U>hw  (if  Ubind  =  hw). 

For  a  particular  pipeline  implementation,  the  ini¬ 
tiation  interval  II,  is  the  time  taken  for  one  itera¬ 
tion  of  the  steady  state.  For  example  in  Figure  3, 
the  pipelined  implementation  has  II  —  269  t-units. 
Given  a  DDG  and  an  architecture  its  possible  to  es¬ 
tablish  a  lower  bound  on  the  initiation  interval.  This 
is  called  the  minimum  initiation  interval,  MIL  The 
Mil  is  limited  by  two  factors.  Firstly  the  archi¬ 
tecture  resources  limit  the  MIL  This  is  called  the 
resource  constrained  Mil ,  ResMII.  For  example 
the  DDG  in  Figure  2  requires  at  least  212  t-units 
to  execute  tasks  1  and  3  which  are  bound  to  SW. 
The  SW  resource  constrained  Mil ,  ResMIIsw  is 
given  by  the  sum  of  latencies  of  all  tasks  bound  to 
SW  implementation.  Similarly,  HW  resource  con¬ 
strained  Mil ,  ResMIIffw  is  the  sum  of  latencies  of 
all  tasks  bound  to  HW  implementation.  ResMII  for 
the  DDG  is  then  the  maximum  of  the  two,  that  is 
ResMII  =  max(ResM I Isw , ResMIIsw)  Secondly, 
recurrences  or  cycles  in  the  DDG  also  limit  MIL  This 
is  called  the  recurrence  constrained  Mil ,  RecMII. 
For  example  consider  the  DDG  example  shown  in  Fig¬ 
ure  2.  Let  us  assume  that  we  add  an  extra  dependency 
e  =  (2, 1)  with  J(2, 1)  =  1  to  the  DDG.  In  such  a  case 
the  pipelined  implementation  in  Figure  3  becomes  in¬ 
valid.  This  is  because  the  instance  of  task  1  belonging 
to  the  second  iteration  cannot  start  executing  before 
the  the  instance  of  task  2  belonging  to  the  first  itera¬ 
tion  of  the  loop.  This  constraint  is  introduced  because 
of  the  recurrence  present  in  the  DDG.  The  RecMIIr 
for  a  recurrence  r,  is  given  by  the  ratio  of  the  sum 
of  the  latencies  of  the  tasks  in  the  recurrence  to  the 
sum  of  the  weights  ((5)  of  all  the  dependencies  in  a  re¬ 
currence.  A  graph  may  have  more  than  one  cycle,  and 
RecMII  is  then  the  maximum  of  the  RecM I Ir  due  to 
each  one  of  them,  that  is  RecMII  =  max(RecMIIr ), 
for  all  the  recurrences  r  in  the  DDG.  The  Mil  is 
then  the  maximum  of  ResMII  and  RecMII ,  that 
is  Mil  =  max  (ResMII,  RecM  1 1) .  The  maximum 
execution  throughput  of  a  DDG ,  MaxTh  is  the  max¬ 
imum  iterations  of  the  steady  state  possible  in  one 
time  unit.  Its  given  by: 

MaxTh  =  7-^77 
Mil 

4  Pipeline  Scheduling  Technique 

The  objective  of  the  technique  is  to  obtain  a 
pipeline  schedule  of  the  the  DDG  which  has  Mil  as 
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its  initiation  interval  and  which  requires  least  amount 
of  shared  memory.  The  pipeline  schedule  of  the  DDG 
determines  the  steady  state  of  the  pipeline.  The  flow 
diagram  of  the  technique  is  shown  in  Figure  4.  The 
inputs  to  the  pipeline  scheduler  are  the  partitioned 
DDG ,  the  codesign  architecture  and  a  desired  upper 
bound  on  initiation  interval,  Max II.  The  pipeline 
scheduler  first  calculates  the  Mil  for  the  design.  It 
then  tries  to  schedule  the  DDG  in  Mil  time.  If  its 
unsuccessful  it  selects  a  dependency  to  be  retimed. 
Retiming  as  we  will  see  later  transforms  a  schedule 
constraining  dependency  into  a  free  scheduling  depen¬ 
dency  which  does  not  constrain  the  scheduler.  In  this 
process  however,  it  increases  the  iteration  indices  of 
some  tasks.  Hence  retiming  produces  a  DDG  with 
tasks  belonging  to  different  iterations  of  the  steady 
state.  In  other  words  retiming  produces  a  pipelined 
DDG .  This  inner  loop  of  scheduling  and  retiming 
continues  till  a  successful  schedule  is  found  or  all  the 
dependencies  have  been  retimed.  In  the  latter  case 
we  increase  the  initiation  interval  and  try  scheduling 
again.  We  set  the  increment  factor  to  the  maximum 
of  the  following  two  values:  one  time  unit  or  one  per¬ 
cent  of  Mil.  We  exit  the  outer  loop  when  the  initia¬ 
tion  interval  II  becomes  greater  than  the  user  specified 
Max  II. 

The  inputs  to  the  scheduler  are  the  DDG  and  the 
expected  initiation  interval  II.  The  objective  of  the 
scheduler  is  to  obtain  a  pipeline  schedule  of  the  DDG 
in  II  time  using  the  least  amount  of  shared  memory. 
The  schedule  is  an  assignment  of  start  times  to  tasks, 
S(u),  such  that  for  all  tasks  u  in  the  graph  0  <  S(u)  < 
II  [3].  For  a  dependency  e  =  (u,v),  the  schedule 
time  of  u  and  v  must  honor  the  data  dependence,  ie 
S(v)+6(u,v)  x  JI  >  S(u)  +  Lu  =*  S(v)  >  S{u)  +  Lu- 
5(u,v)  x  II.  Also  there  should  be  enough  resources 
and  shared  memory  to  execute  a  task  scheduled  at  a 
particular  time  instance.  The  memory  requirement  of 
a  task  during  execution  is  the  total  memory  required 
by  the  variables  in  the  task’s  read  set  and  write  set. 
The  pipeline  schedule  of  a  task  is  then  formalized  as 
below: 

For  a  given  II,  a  pipeline  schedule  of  DDG  = 
G(V,  E ,  A,  <5)  is  an  integer  labeling ,  S  N  which  ful¬ 
fills  the  following  conditions  : 

•  Vu  G  V,  0  <  S(w)  <  II. 

•  V(u,t7)  G  E,S{v)  >  S(u)  +  Lh  -  II‘6(u,v),  that 
is  all  dependences  must  be  honored. 

•  There  are  sufficient  resources  (HW  arid  SW)  to 
execute  the  task  scheduled  at  a  particular  time  in¬ 
stant. 


tn  tess  than  Maxll  Time  Schedule.  Throughput 

Rate  and  Memory 
Requirements 

Figure  4:  Pipeline  Scheduling  Technique 


•  There  is  sufficient  memory  to  execute  the  task 
scheduled  at  a  particular  time  instant. 

Schedule  Constraining  Dependencies.  For  a 
given  initiation  interval  II,  the  data  dependencies 
in  a  DDG  can  be  classified  in  to  positive  scheduling 
dependencies  (PSDs),  negative  scheduling  dependen¬ 
cies  (NSDs)  or  free  scheduling  dependencies  (FSDs) 
[3].  A  dependency  ( u,v )  is  a  PSD  if  Lu  —  II  • 
5{u,v)  >  0.  A  dependency  is  a  FSD  or  NSD  if 
Lu  -  II  *  <S(u,  v)  <  0.  PSDs  constrain  scheduling  since 
they  make  S(v)  >  S(u),  in  other  words  task  v  must 
be  scheduled  later  than  task  u.  FSDs  do  not  con¬ 
strain  scheduling.  NSDs  could  constrain  a  schedule 
if  pipelined  resources  are  used  or  if  an  iteration  of  the 
steady  state  begins  before  the  previous  one  finishes 
(non- rectangular  schedule).  Since  neither  of  the  two 
conditions  are  true  in  our  case,  NSDs  do  not  con¬ 
strain  the  schedule.  The  set  of  schedule  constraining 
dependencies  Es  is  then  given  by: 

Es  =  {{u,v)  G  E\LU  -  II  *  6(u,v)  >  0} 
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PSDs  are  also  called  intra  loop  dependencies  (or 
ILDs)  and  FSDs  and  NSDs  are  together  called  as 
loop  carried  dependencies  (or  LCDs).  A  dependency 
(u,u)  is  a  ILD  if  S(u,v )  =  0  and  it  is  a  LCD  if 
5{u,  v)  >  0. 

Given  a  set  of  schedule  dependencies  we  can  define 
two  properties  for  every  task.  The  first  one  called  the 
height  of  the  task,  H(u)  gives  the  as  soon  as  possible 
(ASAP)  schedule  time  of  a  task.  The  second  one  called 
the  depth  of  a  task,  D(u)  is  a  measure  of  the  “urgency” 
of  the  task  to  be  scheduled.  It  is  given  by: 

.  f  Lu,  if  there  doesn't  exist  a  (u,v)  G  Es 
'  '  [  ma xe€gs(D(v)  +  Lu  -  II  -  6(e) ),  otherwise 

where  e  =  (u,u).  For  an  initiation  interval  II , 
( II-D(u ))  gives  the  as  late  as  possible  (ALAP)  sched¬ 
ule  time  of  the  task.  Both  these  quantities  can  be 
calculated  by  a  breadth  first  search  of  the  DDG. 

A  path  p  =  {ei, . . . , en}  is  called  a  positive  path,  if 
Ve  €  p,  e  is  a  PSD.  The  Length  of  p  is: 

Length(p)  =  Lw  4-  ^  (Lu  -  II  •  <5(?z,t>)), 

(uft/)ep 

where  Lw  is  the  latency  of  the  tail  task  in  the  positive 
path.  For  a  task  that  is  the  head  node  of  a  positive 
path  the  above  expression  gives  the  depth  of  the  task. 
A  maximal  positive  path ,  MPP  of  a  DDG ,  is  a  positive 
path  p  such  that,  for  any  other  positive  path  p '  C  j?, 
Length(p)  >  Length(p').  The  MPP  for  a  DDG  is 
then  given  by: 

MPP  =  max(D(u)),yiu  £  V 

For  a  feasible  schedule  of  a  DDG  with  initiation  in¬ 
terval  17, 

MPP  <  II. 

Calculation  of  Memory  Requirement  Now  let 
us  consider  the  memory  requirements  of  a  pipeline 
schedule.  We  assume  that  the  memory  is  reserved  for 
the  write  set  of  a  task  as  soon  as  it  begins  execution, 
and  it  remains  reserved  until  the  task  which  uses  the 
data  finishes  execution.  In  other  words,  memory  is 
reserved  for  some  data  as  soon  as  the  producer  task 
begins  execution  and  it  is  freed  once  the  consumer  task 
finishes  execution.  In  a  pipeline  schedule  the  memory 
requirement  is  due  to  ILDs  and  LCDs .  ILDs  do  not 
cross  the  boundary  between  two  consecutive  iterations 
of  the  steady  state.  All  the  data  belonging  to  any  ILD 
is  produced  and  consumed  within  one  iteration  of  the 
steady  state.  LCDs  cross  the  boundary  between  two 


iterations  of  the  steady  state.  Depending  on  the  dis¬ 
tance  (or  5)  they  might  cross  more  than  one  boundary. 
Hence  before  an  iteration  of  the  staedy  state  can  begin 
execution  there  is  already  some  memory  occupied  by 
the  LCD  data  which  is  given  by  : 

MeniLCD  —  ^  &v  ar  xS(e) 

ceLC  D 

Mem lcd  is  the  same  at  the  beginning  of  each  it¬ 
eration  of  the  steady  state.  Hence  we  need  at  least 
MemicD  memory  for  the  pipeline  design.  The  mem¬ 
ory  required  during  one  iteration  of  the  steady  state 
is  the  maximum  amount  of  memory  occupied  by  the 
data  items  during  execution,  MemeXec-  This  memory 
is  both  due  to  ILDs  and  LCDs.  The  memory  require¬ 
ment  of  a  pipelined  design,  MemReq  is  then  given  by: 

MemReq  =  max(MemLCD:  Memexec) 

In  the  next  section  we  discuss  the  list  based  schedul¬ 
ing  algorithm. 

4.1  List  Based  Scheduler 

We  use  a  list  based  scheduler  for  scheduling  the 
DDG  on  the  codesign  architecture.  The  scheduler 
maintains  three  ready  lists,  one  each  for  HW,  SW  and 
memory  resource.  The  execution  of  a  task  can  be  di¬ 
vided  in  to  three  states.  When  a  task  is  selected  to 
be  scheduled  from  either  HW  or  SW  ready  list,  it  first 
goes  in  to  read  state .  When  the  task  has  finished  read¬ 
ing  it  goes  in  to  run  state  and  then  in  write  state  when 
its  writing  data  to  the  shared  memory.  A  task  in  the 
read  and  write  states  could  cause  a  memory  conflict 
with  another  task.  The  scheduler  resolves  conflicts  by 
maintaining  a  ready  list  for  the  memory  resource.  A 
task  is  added  to  HW  or  SW  ready  list  when  all  its 
predecessor  tasks  have  been  scheduled.  When  a  task 
is  selected  to  be  scheduled  on  a  particular  resource, 
its  goes  into  read  state  and  is  added  to  the  memory 
ready  list.  A  task  on  completion  of  its  read  opera¬ 
tion  runs  on  the  appropriate  resource  and  gets  added 
to  the  memory  ready  list  again  when  it  goes  into  its 
write  state. 

The  scheduler  uses  the  same  heuristic  priority  func¬ 
tion  to  select  a  task  from  the  three  ready  lists.  The 
priority  of  a  task  to  be  selected  depends  on  the  follow¬ 
ing  four  properties  in  descending  order  : 

1.  0-Mobility:  The  mobility  of  a  task  is  given  by  the 
difference  between  its  ALAP  and  ASAP  times. 
The  ASAP  time  may  change  during  scheduling 
and  its  updated.  The  ALAP  time  of  a  task  is 
constant  for  a  given  initiation  interval.  If  a  task 
has  0-Mobility  then  it  must  be  scheduled  at  that 
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time.  Otherwise  the  timing  constraints  will  be 
violated. 

2.  Mobility:  A  task  with  lesser  mobility  is  selected  to 
be  scheduled  before  a  task  with  greater  mobility. 
It  is  a  well  established  heuristic  which  is  known 
to  produce  good  results. 

3.  Difference  between  number  of  read  and  write  vari¬ 
ables  (or  data  items):  The  memory  requirement 
of  a  schedule  is  given  by  the  maximum  memory 
occupied  by  the  data  items  during  one  iteration 
of  the  steady  state.  A  task  which  reads  more  vari¬ 
ables  than  it  writes  would  reduce  the  number  of 
variables  present  in  the  memory.  Hence  it  should 
be  scheduled  near  its  ASAP  time.  Alternatively 
a  task  which  writes  more  variables  than  it  reads 
should  be  scheduled  near  its  ALAP  time. 

4.  Number  of  Successors:  A  list  scheduling  algo¬ 
rithm  performs  better  when  it  has  more  choice 
in  the  ready  list.  Hence  a  task  whose  completion 
adds  more  tasks  to  the  ready  list  is  selected. 

A  task  with  0-mobility  is  always  selected  from  the 
ready  list.  If  no  task  has  0-mobility  we  use  property 
2  to  select  a  task,  and  properties  3  and  4  (in  that 
order)  to  break  ties.  In  the  next  section  we  present 
the  retiming  heuristic. 

4.2  Retiming  Heuristic 

Retiming  increases  the  distance  of  a  dependence 
and  produces  an  equivalent  DDG  which  satisfies  the 
following  condition: 

Two  graphs ,  DDG  —  G(V,E,\,8)  and  DDG '  = 
G{V,  E,  A',  5')  are  equivalent  if,  V(u,u)  €  E,  the  fol¬ 
lowing  equation  holds, 

\{v)  -  X(u)  4-  8 (u,  v)  =  A' ( v )  -  A'  ( u )  +  8f  (u,  v) 

We  do  retiming  when  we  are  unable  to  schedule  a 
DDG  in  the  given  initiation  interval,  II.  A  successful 
schedule  for  a  DDG  can  be  obtained  by  decreasing  the 
number  of  dependencies  that  constrain  the  schedule. 
By  retiming  we  can  transform  a  PSD  into  a  FSD 
or  NSD.  The  drawback  of  retiming  is  that  it  in¬ 
creases  the  memory  requirement  of  the  schedule.  We 
can  minimize  this  increase  by  using  good  heuristics 
to  select  the  dependency  to  be  retimed.  But  this  is 
not  enough.  In  order  to  produce  an  equivalent  DDG 
other  dependencies  might  need  to  be  retimed.  The 
increase  in  memory  requirement  due  to  these  depen¬ 
dencies  should  also  be  minimized.  During  retiming  we 
do  not  increase  the  distance  of  a  dependence  belonging 
to  a  recurrence.  Also  we  ensure  that  no  dependency 
has  8  <  0. 


We  do  retiming  in  two  steps.  In  the  first  step 
we  heuristically  select  a  dependency  to  be  retimed. 
Increasing  the  distance  of  a  dependence  necessitates 
changing  the  A  and  8  of  other  tasks  and  dependencies. 
In  a  DDG  there  might  exist  a  number  of  sets  of  depen¬ 
dencies  whose  distance  could  be  increased  to  obtain  an 
equivalent  retimed  DDG.  In  step  2  we  select  the  set 
of  dependencies  which  on  retiming  result  in  the  least 
increase  in  memory  requirement.  As  a  first  step  to¬ 
wards  retiming  we  select  a  dependency  to  be  retimed. 
The  priority  of  a  dependency  to  be  retimed  depends 
on  its  following  four  properties  in  decreasing  order: 

1.  Dependency  is  a  PSD:  The  primary  objective 
of  retiming  is  to  reduce  scheduling  constraints  in 
the  DDG;  and  give  the  scheduler  greater  freedom 
in  scheduling  tasks  on  the  resources.  Only  PSDs 
constrain  scheduling  and  therefore  only  PSDs  are 
retimed. 

2.  Dependency  between  tasks  bound  to  heteroge¬ 
neous  resources:  Increasing  the  distance  of  a  de¬ 
pendency  between  tasks  mapped  to  the  same  re¬ 
source  does  not  necessarily  help  the  scheduler. 
Basically  the  two  tasks  have  to  be  scheduled  on 
the  same  resource  and  will  be  scheduled  one  af¬ 
ter  the  other.  On  the  other  hand  retiming  a 
dependency  between  tasks  mapped  to  different 
resources  definitely  gives  more  freedom  to  the 
scheduler. 

3.  Dependency  whose  predecessor  task  has  a  greater 
sum  of  height  and  depth  (H(u)  +  D(u)):  The  sum 
of  height  ( H(u ))  and  depth  (D(u))  of  a  task  gives 
the  length  of  the  positive  path  to  which  it  belongs. 
Increasing  the  distance  of  a  dependency  whose 
predecessor  task  has  a  greater  sum  (H ( u )  +D(u)) 
reduces  the  length  of  a  longer  positive  path  in  the 
DDG. 

4.  Dependency  representing  the  least  number  of 
variables  transferred:  A  secondary  objective  of  re¬ 
timing  transformation  is  to  minimize  the  increase 
in  memory  requirement  of  the  DDG.  Hence  we 
select  a  dependency  representing  fewer  variables 
being  transferred. 

We  use  property  1  to  select  dependencies  to  be  re¬ 
timed,  and  use  properties  2  ,  3  and  4  (in  that  order) 
to  break  ties.  Given  a  dependency  e  =  {u,v)  to  be 
retimed  we  define  the  following  four  sets  with  respect 
to  u: 

Vc  =  {"connected  component  to  which  u  belongs  } 


54 


Figure  5:  P  ,  S  and  R  sets  during  retiming  of  depen¬ 
dency  (u,v) 


P  =  {v  e  Vc\ there  is  a  path  from  v  to  u  }  U  {u} 

5  =  {v  €  Vc\there  is  a  path  from  u  to  v  } 

R  =  VC-{PUS} 

Figure  5  gives  an  illustration  of  the  four  sets.  We 
can  retime  the  dependency  e  =  (u,v)  by  the  following 
three  equations. 

A(u)  =  A(u)  4*  1 

8(u,x)  =  5(u,x)  -M,Vx  6  V  such  that  (u,x)  €  E 

5(x,u)  =  5(x,u)  -  l,Vx  €  V  such  that  (s,u)  €  E 

Application  of  the  three  equations  would  result  in  an 
equivalent  DDG.  However  the  third  equation  de¬ 
creases  the  distance  of  some  dependencies.  This  can 
be  avoided  by  increasing  the  A  of  all  tasks  which  are  in 
P  and  increasing  the  5  of  all  dependencies  whose  pre¬ 
decessor  task  is  in  the  set  P  and  successor  is  in  R  U  S. 
This  is  the  cutset  cl  in  Figure  5.  Another  way  to  re¬ 
time  is  to  increase  the  A  of  all  tasks  in  the  set  PUR  and 
increasing  the  <5  of  all  dependencies  whose  predecessor 
is  in  PUR  and  successor  is  in  S.  This  is  the  cutset  c2 
in  Figure  5.  However  its  possible  that  neither  cutset 
cl  nor  c2  give  us  a  minimum  increase  in  memory.  We 
could  obtain  another  cutset  c3  (see  Figure  5)  by  par¬ 
titioning  the  set  R  into  P  and  S,  so  that  the  memory 


increase  is  minimized.  We  use  a  simulated  annealing 
based  partitioner.  The  cost  function  being  minimized 
is  defined  as  follows.  For  a  cut  c*  =  {ei,e2, . . .  ,en}, 
the  cutsize  cost  is  given  by  : 

'  n 

Cost  =  Yvar{ej) 
j— i 

var(ej)  is  the  number  of  variables  across  the  depen¬ 
dency  ej .  In  the  above  cost  function  the  sum  gives  us 
the  extra  memory  required  by  the  LCDs  after  retim¬ 
ing.  After  partitioning  R  into  P  and  5,  we  do  retiming 
using  the  following  two  equations: 

Vu  G  P,  A(u)  =  A(u)  4*  1 
V(u,v)  €  Eyu  €  P,v  £  P,£(u,  v)  =  S(u,v)  4- 1 

5  Experimental  Results 

We  demonstrate  the  effectiveness  of  the  tool  in 
codesign  flow  by  considering  the  design  of  a  JPEG 
[4]  like  compression  algorithm.  The  DDG  of  the  spec¬ 
ification  is  shown  in  Figure  6.  It  consists  of  four  tasks, 
Forward  Discrete  Cosine  Transform  (FDCT),  Quanti¬ 
zation,  Zig-Zag  and  RLE  and  Huffman  encoding.  All 
the  dependencies  have  <5  =  0  and  the  number  of  vari¬ 
ables  transfered  across  each  dependency  is  16.  The 
memory  read  time  is  16  ns  and  the  memory  write  time 
is  24  ns  respectively.  The  run  times  of  the  various 
tasks  in  SW  and  HW  is  shown  in  Table  1  [6].  Ta¬ 
ble  2  shows  the  comparison  between  throughput  and 
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No. 

Number 

of  Tasks 

Depth 

Pipeline 

Speed-up 

(%) 

Memory 
Incr.  (%) 

Time  (ns) 

Memory 

Mil  (ns) 

II  (ns) 

Memory 

in 

3 

1 

110 

8 

90 

90 

16 

18 

100 

3 

2 

390 

7 

240 

290 

14 

34.5 

100 

5 

3 

230 

17 

490 

190 

34 

17.4 

100 

6 

3 

1410 

30 

1135 

1170 

75 

17 

150 

8 

5 

750 

170 

600 

600 

190 

20 

11.7 

8 

7 

890 

10 

730 

730 

20 

18 

100 

|| 

8 

7 

10 

425 

470 

20 

36 

100 

8 

8 

7 

5 

465 

580 

20 

35 

300 

9 

4 

1130 

15 

842 

931 

30 

17.6 

100 

10 

10 

6 

390 

34 

300 

300 

43 

23 

26 

15 

7 

950 

76 

770 

770 

97 

18.9 

27.6 

15 

52 

860 

860 

66 

33.3 

26.9 

13 

20 

7 

129 

870 

870 

182 

27.5 

41 

14 

20 

14 

1150 

96 

1010 

1010 

104 

12.2 

8.3 

mm 

6 

6320 

534 

5640 

5640 

794 

10.8 

48.6 

Table  3:  Comparison  between  Non-Pipeline  and  Pipeline  Implementations  for  Random  Graphs 


2,  4,  7,  8  and  9)  we  were  not  able  to  obtain  pipeline 
schedules  with  Mil  as  their  initiation  interval.  This  is 
because  of  the  memory  conflicts  during  scheduling  and 
recurrences  in  the  graph.  Memory  conflicts  force  the 
scheduler  to  defer  a  read  or  a  write  operation  thereby 
increasing  II.  Dependencies  belonging  to  recurrences 
are  not  retimed,  hence  they  constrain  the  scheduler 
leading  to  an  increase  in  II.  The  increase  in  memory 
requirement  of  a  pipeline  schedule  is  due  to  the  extra 
memory  that  is  required  to  store  data  items  between 
two  iterations  of  the  steady  state.  It  is  quite  com¬ 
mon  for  the  increase  to  be  in  the  region  of  100  to  300 
percent.  Speed-up  due  to  pipelining  was  achieved  for 
all  graphs.  For  some  graphs  (rows  5,  10,  11,  12  and 
13)  a  good  speed-up  was  achieved  with  a  low  memory 
increment,  thereby  making  them  ideal  candidates  for 
pipelined  implementation. 

6  Conclusion 

In  this  paper  we  have  presented  a  pipeline  schedul¬ 
ing  technique  for  optimizing  the  throughput  and  mem¬ 
ory  requirements  of  HW-SW  codesigns.  The  effec¬ 
tiveness  of  the  technique  was  demonstrated  by  exper¬ 
imentation.  This  technique  will  be  an  integral  part 
of  a  larger  codesign  tool  now  under  development.  Fu¬ 
ture  work  will  involve  extension  of  the  technique  to 
include  general  multiple  ASIC  architectures  with  dif¬ 
ferent  communication  protocols. 
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RECOD:  A  Retiming  Heuristic  To  Optimize  Resource  And  Memory 

Utilization  In  HW/SW  Codesigns 

Abstract 

Hardware/Software  designs  of  embedded  systems  are  characterized  by  stringent  performance  constraints . 
Pipelined  implementation  of  a  design  is  an  effective  way  for  maximizing  the  performance  of  a  design .  In 
this  paper  we  present  a  retiming  heuristic  to  obtain  pipelined  schedules  for  hardware-software  codesigns . 
The  heuristic  aims  at  maximizing  the  throughput  of  a  resource  constrained  codesign  while  minimizing  its 
memory  usage.  The  effectiveness  of  the  proposed  technique  is  demonstrated  by  experimentation. 
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Figure  2:  Non-Pipelined  and  Pipelined  Implementations  of  a  Task  Graph 


the  task.  The  memory  requirement  of  the  implementation  is  the  maximum  memory  used  by  one  iteration 
of  the  loop  (shown  by  the  dotted  line  in  the  figure).  This  happens  when  tasks  2  and  3  execute  in  parallel. 
Task  2  needs  memory  for  20  data  items  and  task  3  needs  memory  for  10  data  items.  Also  at  this  point 
in  time  the  variables  transferred  from  task  1  to  task  4  (10  data  items)  are  also  stored  in  the  memory. 
Hence  the  maximum  memory  used  by  the  implementation  is  for  40  data  items.  Now  consider  a  pipelined 
implementation  of  the  same  task  graph  (lower  right  corner  of  the  figure).  A  pipeline  execution  of  a  design 
can  be  divided  into  3  parts.  The  first  part  which  loads  the  pipeline  is  called  the  prologue.  The  second  part 
is  the  steady  state  which  is  executed  a  several  times.  Finally  the  last  part  which  down  loads  the  pipeline  is 
called  the  epilogue.  As  shown  in  the  figure  the  execution  of  task  4  belonging  to  the  first  iteration  of  the  loop 
is  overlapped  with  execution  of  task  1  belonging  to  the  second  iteration.  Once  fully  loaded  the  steady  state 
completes  one  iteration  of  the  loop  every  200  t-units.  A  definite  improvement  over  the  previous  design. 
The  drawback  is  that  the  memory  requirement  has  increased  to  70  data  items  (shown  by  the  dotted  arrow 
line). 

We  implement  pipelined  designs  by  using  retiming  transformation.  Retiming  to  generate  pipelined  design 
is  considered  a  generalization  [3]  of  the  classical  transformation  introduced  by  Leiserson  and  Saxe  [10].  A 
similar  problem  is  the  software  pipelining  problem  [9]  in  code  generation  for  VLIW  architectures.  Given  a 
task  graph  to  be  pipelined  it  can  generally  be  retimed  in  more  than  one  way.  We  need  to  select  a  retiming 
that  gives  us  the  least  increase  in  memory  requirements.  In  this  paper  we  present  a  Retiming  heuristic  for 
optimal  resource  and  memory  utilization  in  HW/SW  Codesigns  (RECOD). 

In  this  paper  we  concentrate  on  the  design  of  DSP  applications.  DSP  applications  have  moderately  simple 
algorithms  and  they  demand  high  performance  and  throughput;  thus  necessitating  search  for  efficient  and 
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inexpensive  implementations  [13].  Besides  many  of  these  applications  are  loop  oriented  where  a  single 
block  of  code  is  executed  a  number  of  times  on  different  set  of  data,  thereby  making  them  ideal  candidates 
for  pipelined  implementation. 

In  this  paper  we  assume  that  the  SW  processor  and  the  ASIC  in  the  codesign  architecture  are  themselves 
non-pipelined  with  respect  to  task  execution.  We  also  assume  that  the  pipeline  schedule  is  rectangular 
in  nature,  that  is  a  new  iteration  of  the  steady  state  does  not  begin  before  the  previous  one  is  over.  In 
a  non-rectangular  schedule  the  execution  of  a  task  belonging  to  one  iteration  of  the  steady  state  overlaps 
with  the  execution  of  a  task  belonging  to  the  next  iteration. 

The  paper  is  organized  as  follows.  In  Section  2  we  discuss  previous  work,  in  Section  3  we  describe  the 
graph  representation  and  pipeline  schedule,  Section  4  presents  RECOD,  experimental  results  are  in  Section 
5  and  finally  Section  6  concludes  the  paper. 

2  Previous  Work 

The  term  “Retiming”  was  introduced  by  Leiserson  and  Saxe  [10]  when  they  used  it  to  solve  the  problem  of 
optimizing  the  throughput  of  synchronous  circuitry.  Retiming  was  used  to  describe  the  re-distribution  of 
register  delays  between  combinational  blocks  in  a  synchronous  circuit.  They  developed  an  ILP  formulation 
to  solve  the  problem.  Since  then  retiming  transformation  has  been  used  extensively  in  logic  synthesis  [11], 
high  level  synthesis  [15]  [17],  HW-SW  codesign  [18]  and  DSP  applications  [7]  [8].  Pipelining  is  considered  a 
generalization  of  the  retiming  problem  in  which  circuit  latency  is  allowed  to  increase  by  allowing  a  change 
in  the  production  and  consumption  times  of  output  and  input  signals  respectively  [3]. 

The  term  “Software  Pipelining”  was  introduced  by  M.  Lam  [9].  She  used  it  to  describe  a  loop  scheduling 
technique  for  code  generation  of  VLIW  processors.  In  software  pipelining  multiple  iterations  of  the  loop 
in  various  stage  of  their  execution  are  in  progress  simultaneously.  This  description  relates  it  very  closely 
to  pipelining  in  hardware  systems.  Since  then  a  number  of  heuristic  [1]  [6]  and  ILP  formulations  [4]  [12] 
have  been  proposed  to  solve  the  software  pipelining  problem.  [16]  gives  a  good  comparison  and  survey  of 
the  techniques.  [2]  establishes  a  link  between  circuit  retiming  and  software  pipelining. 

The  work  that  comes  closest  to  the  paper  is  that  of  Sanchez  presented  in  [17].'  In  that  work,  Sanchez 
has  used  a  retiming  heuristic  in  a  high  level  synthesis  tool  that  aims  at  obtaining  pipelined  designs  with 
optimum  throughput.  The  retiming  heuristic  retimes  the  head  or  tail  dependency  of  the  maximum  positive 
path  in  a  graph.  In  this  paper  we  present  a  new  retiming  heuristic  which  optimizes  both  throughput  and 
memory  requirements  of  pipelined  codesign  applications.  Our  heuristic  does  retiming  in  two  steps.  In  the 
first  step  it  selects  a  dependency  to  be  retimed  which  gives  the  maximum  freedom  to  the  scheduler.  In  the 
second  step  it  selects  the  other  dependencies  (in  addition  to  the  first  one)  which  on  retiming  result  in  an 
equivalent  graph  with  the  least  increase  in  shared  memory  requirements.  Experimental  results  show  that 
our  retiming  strategy  produces  designs  which  use  significantly  lesser  memory  and  operate  at  the  optimum 
throughput  rate. 
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3  Graph  Representation  and  Pipeline  Scheduling 

Graph  Representation  The  input  specification  is  captured  by  an  intermediate  graph  format  called  the 
Data  Dependency  Graph  (DDG).  It  represents  the  tasks  by  vertices  and  the  data  dependencies  between 
tasks  by  directed  edges.  The  vertices  have  information  about  the  task  binding  (HW  or  SW),  HW  run 
time  and  SW  run  time.  The  edges  have  information  about  the  number  of  variables  in  a  dependence. 
Since  we  are  interested  in  pipelining  the  design,  we  associate  with  each  vertex  an  iteration  index  (A)  and 
with  each  edge  a  dependency  distance  (5).  The  iteration  index  of  a  task  u,  A (u)  indicates  that  at  the 
ith  iteration  of  the  steady  state,  instance  of  task  u  belonging  to  the  (i  +  A(u))  iteration  of  the  loop  is 
executed.  For  example  consider  the  pipelined  design  in  Figure  2.  In  the  first  iteration  of  the  steady  state, 
instance  of  task  2  belonging  to  the  second  iteration  of  the  loop  is  executed,  hence  X(task2)  =  1.  Similarly 
X(taskl)  =  1,  X(task3)  =  1  and  X(task4)  =  0.  The  dependence  distance  of  an  edge  e,  8{e)  indicates  the 
distance  of  the  dependence.  In  Figure  2  the  data  produced  by  task  1  at  the  ith  iteration  of  the  steady 
state  is  consumed  by  task  4  at  the  (i  +  l)i/l  iteration  of  the  steady  state.  Hence  the  dependence  distance 
of  edge  (1,4)  is  <1(1, 4)  =  1.  Similarly  5(1,2)  =  0,5(1, 3)  =  0,5(2, 4)  =  1  and  5(3,4)  =  1.  We  now  formalize 
the  DDG  representation  as  follows: 

A  DDG  is  a  4-tuple  DDG  —  G(V,E,X,8),  where  : 

•  V  is  the  set  of  vertices.  Each  vertex  u  €  V  represents  a  task.  For  each  task  u  €  V  we  have  the 
following  information  available  to  us  : 

—  Ubind  :  The  binding  of  the  task,  that  is  whether  its  going  to  be  implemented  in  HW  or  SW. 

—  usw  :  The  5W  runtime  of  the  task  for  a  particular  input  data  on  the  general  purpose  processor. 
—  Uhw  ■'  The  HW  runtime  of  the  task  if  it  were  to  be  implemented  as  an  ASIC  for  the  same  input 
data. 

•  E  is  the  set  of  directed  edges.  Each  e  =  (u,  v)  €  E  represents  a  data  dependence  between  tasks  u  and 
v.  Every  edge  has  information  about  the  number  of  variables  (evar)  represented  by  the  dependence. 

•  A  and  5  are  two  mappings,  X  :  V  ->  N  and  5  :  E  ->  N,  representing  the  iteration  index  (X)  and  the 
number  of  iterations  traversed  by  the  dependence  (8),  also  called  dependence  distance.  IN’  is  the  set 
of  natural  numbers. 

Tnit.ia.11y,  Vu  £  V,  X (u)  —  0.  Notice  that  the  representation  has  no  control  flow  constructs;  it  is  strictly 
data  flow. 

Theoretical  Upper  Bound  on  Throughput  Given  a  DDG  there  exists  a  theoretical  upper  bound  on 
the  throughput  of  a  pipeline  schedule  of  the  graph  [17].  It  is  called  the  maximum  execution  throughput 
(MaxTh)  and  it  gives  the  maximum  number  of  iterations  of  the  steady  state  in  one  time  unit.  The  reciprocal 
of  MaxTh  is  called  the  minimum  initiation  interval  (Mil).  For  a  particular  pipeline  implementation  the 
initiation  interval,  II,  is  the  time  taken  for  one  iteration  of  the  steady  state.  For  example  in  Figure  3,  the 
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pipelined  implementation  has  II  =  200  t-units.  The  Mil  is  limited  by  two  factors.  Firstly  the  number  of 
resources  (HW  or  SW)  limit  Mil.  This  is  called  the  resource  constrained  Mil,  ResMII.  Consider  again  the 
example  shown  in  Figure  2.  The  task  graph  has  two  tasks  1  and  3  bound  to  SW.  Hence  we  need  at  least 
200  t-units  to  complete  the  execution  of  task  1  and  3.  Similarly  we  need  at  least  200  t-units  to  complete 
execution  of  tasks  2  and  4  in  HW.  The  ResMIIi  due  to  a  resource  i  is  given  by  the  ratio  of  the  sum  of 
the  latencies  of  all  the  tasks  executing  on  the  resource  i  by  the  total  number  of  instances  of  resource  i  [17]. 
The  latency  of  a  task  u,  Lu,  is  the  total  execution  time  of  the  task.  It  is  the  sum  of  the  task’s  read  time, 
execution  time  on  the  particular  resource  that  its  been  bound  to  and  write  time.  The  read  (write)  time  of 
a  task  is  the  product  of  the  number  of  variables  read  (written)  by  the  task  and  the  memory  read  (write) 
time.  Hence  we  have, 

,  f  ti, sw  if  tt bind  = 

Jju  —  Ui  +  'Urdtime  ‘  'U'wrtime  Wtl6TC}  11%  \ 

{  UfiW  if  Ufcfid  —  hw 


Since  the  codesign  architecture  has  only  one  HW  and  one  SW  resource,  we  can  calculate  ResMIIjjw 
and  ResMIIsw  as  the  sum  of  latencies  of  all  tasks  bound  to  HW  and  SW  respectively.  ResMII  for  a 
DDG  is  the  maximum  of  all  the  ResMIIi ,  therefore  we  have  ResMII  —  max{ResMIItiw ,  ResMIIsw)- 
Secondly  recurrences  or  cycles  in  a  task  graph  also  limit  MIL  This  is  called  the  recurrence  constrained 
Mil ,  RecMII.  Let  us  assume  that  in  Figure  2,  the  data  produced  by  task  4  in  ith  iteration  of  the  loop 
is  consumed  by  task  1  in  the  (i  +  l)th  iteration,  that  is  let  us  add  an  edge  e  =  (4, 1)  with  <5(4, 1)  =  1  to 
the  task  graph.  In  such  a  case  the  schedule  shown  in  the  figure  becomes  invalid.  This  is  because  now  we 
cannot  overlap  the  execution  of  task  1  and  task  4.  Infact  any  schedule  of  the  graph  now  takes  at  least  325 
t-units.  The  RecMIIr  for  a  recurrence  r,  is  given  by  the  ratio  of  the  sum  of  the  latencies  of  the  tasks  in 
the  recurrence  to  the  sum  of  the  weights  (S)  of  all  the  dependencies  in  a  recurrence  [17].  A  graph  may 
have  more  than  one  cycle,  and  RecMII  is  then  the  maximum  of  the  RecMIIr  due  to  each  one  of  them, 
that  is  RecMII  =  max(RecMIIr),  for  all  the  recurrences  r  in  the  DDG.  The  Mil  is  then  the  maximum 
of  ResMII  and  RecMII.  That  is, 


Mil  =  max(ResMII,  RecMII )  =>  MaxTh  = 


_ 1 _ 

(max(ResMII,  RecMII )) 


Pipeline  Schedule  The  pipeline  schedule  of  a  task  graph  is  characterized  by  its  initiation  interval 
II.  The  schedule  is  an  assignment  of  start  times  to  tasks,  S(u),  such  that  for  all  tasks  u  in  the  graph 
0  <  S(u)  <  {II—  1).  For  a  dependency  (u,  v ),  the  schedule  time  of  u  and  v  must  honor  the  data  dependence, 
that  is 

S(v )  +  5{u,  v)  -  II  >  S{u)  +LU^  S{v)  >  S(u)  +  LU-  6{u ,  v)  -  II 

As  we  will  see  in  the  next  paragraph  not  all  dependencies  constrain  a  pipeline  schedule.  The  dependencies 
which  do  not  constrain  a  schedule  can  be  ignored  during  scheduling.  We  obtain  a  pipeline  schedule  by 
scheduling  [5]  and  retiming  in  an  iterative  manner  as  shown  in  Figure  3.  We  calculate  the  Mil ,  and  try 
scheduling  the  DDG  for  MIL  However  due  to  constraining  dependencies  we  may  not  be  able  to  schedule 
the  DDG  in  MIL  If  we  can’t  we  retime  the  DDG  and  try  again.  The  objective  of  retiming  is  to  reduce 
the  number  of  schedule  constraining  dependencies. 
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OUTPUT  SUCCESSFUL 
SCHEDULE 


Figure  3:  Pipeline  Scheduling  by  Iterative  Retiming 


Schedule  Constraining  Dependencies  Depending  on  whether  S(u,  v )  is  equal  or  greater  than  zero 
a  data  dependency  (u,v)  may  or  may  not  constrain  a  pipeline  schedule.  A  dependency  with  5(u,v)  =  0 
constrains  a  pipeline  schedule.  This  is  because  now  S(v)  >  S(u)  +  Lu  is  strictly  positive.  Essentially  a 
data  dependence  with  6(u,v)  =  0  implies  that  the  data  produced  by  the  predecessor  task  u  is  consumed 
by  the  successor  task  v  in  the  same  iteration  of  the  steady  state  and  hence  it  constrains  the  schedule. 
Such  a  dependency  is  called  a  positive  scheduling  dependency  (PSD)  [17]  or  intra  loop  dependency  (ILD). 
A  dependency  (u,v)  with  6{u,v)  >  0  gives  us  two  cases.  First  consider  a  dependency  dependency  with 
6(u,v )  >  0  and  Lu  —  II  •  6(u,v)  <  — ( II  —  1).  Such  a  dependency  does  not  constrain  a  pipeline  schedule 
since  for  all  values  of  S(u)  and  S(v)  the  data  dependence  is  satisfied,  that  is 

If  S(u,v)  >  0  and  Lu  —  II  ■  8(u,v)  <  —(II  —  1)  then, 

S(v)  >  S(u)  +  Lu  -  S(u,  v )  -  II,  V5(u),  S(v )  G  [0,  II). 

Such  a  dependency  is  called  a  free  scheduling  dependency  (FSD)  [17].  Now  consider  a  dependency  with 
S(u,  v)  >  0  and  — ( II  —  1)  <  Lu  —  II  ■  S(u,v)  <  0.  Such  a  dependency  is  called  a  negative  scheduling 
dependency  (NSD)  [17]  and  it  will  constrain  a  pipeline  schedule  under  two  conditions.  Firstly  if  the  pipeline 
schedule  is  non-rectangular  then  the  NSDs  would  constrain  the  schedule.  Secondly  if  the  resources  on 
which  tasks  u  and  v  are  executing  are  themselves  pipelined  then  NSDs  would  constrain  the  schedule.  Since 
neither  of  these  two  conditions  axe  true  in  our  case  NSDs  do  not  constrain  the  pipeline  schedule.  FSDs 
and  NSDs  together  are  called  loop  carried  dependencies  (LCDs)  since  they  represent  a  data  dependence 
between  tasks  executing  in  different  iterations  of  the  steady  state.  Hence  for  a  given  initiation  interval  II, 
the  set  of  schedule  constraining  dependencies,  Eb  is  set  of  PSDs  in  the  DDG,  that  is 

Es  =  {( u,v )  G  £|<5(u,v)  =  0} 
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The  initiation  interval  II  of  a  pipeline  schedule  is  constrained  by  the  length  of  the  maximum  positive  path 
(MPP)  in  the  DDG.  A  path  p  =  {ei, . . . ,  en}  is  called  a  positive  path,  if  Ve  €  p,  e  is  a  schedule  constraining 
dependency.  The  Length  of  p  is: 

Length{p)  =  Lw  +  ^  (AO, 

(u,v)€p 

where  Lw  is  the  latency  of  the  tail  task  in  the  positive  path.  A  maximal  positive  path,  MPP  of  a  DDG, 
is  a  positive  path  p  such  that,  for  any  other  positive  path  p'  C  E,  Length(p )  >  Lengthip').  For  a  feasible 
schedule  of  a  DDG  with  initiation  interval  II, 

Length(MPP)  <  II. 

Hence  during  retiming  we  should  try  to  reduce  the  number  of  schedule  constraining  dependencies  which  to  a 
longer  positive  path.  Before  we  present  the  retiming  algorithm  in  the  next  section,  we  discuss  the  memory 
requirements  of  a  pipeline  schedule  in  the  following  paragraph. 

Calculation  of  Memory  Requirement  We  assume  that  the  memory  is  reserved  for  the  write  set  of 
a  task  as  soon  as  it  begins  execution,  and  it  remains  reserved  until  the  task  which  uses  the  data  finishes 
execution.  In  other  words,  memory  is  reserved  for  some  data  as  soon  as  the  producer  task  begins  execution 
and  it  is  freed  once  the  consumer  task  finishes  execution.  In  a  pipeline  schedule  the  memory  requirement 
is  due  to  ILDs  ( PSDs )  and  LCDs  ( FSDs  and  NSDs).  ILDs  do  not  cross  the  boundary  between  two 
consecutive  iterations  of  the  steady  state.  All  the  data  belonging  to  any  ILD  is  produced  and  consumed 
within  one  iteration  of  the  steady  state.  LCDs  cross  the  boundary  between  two  iterations  of  the  steady 
state.  Depending  on  the  distance  (or  S)  they  might  cross  more  than  one  boundary.  Hence  before  an 
iteration  of  the  steady  state  can  begin  execution  there  is  already  some  memory  occupied  by  the  LCD  data 
which  is  given  by  : 

Mem lcd  ~  ^2  evar  x  5{e) 

e€LCD 

MemLCD  is  the  same  at  the  beginning  of  each  iteration  of  the  steady  state.  Hence  we  need  at  least 
Mem lcd  memory  for  the  pipeline  design.  The  memory  required  during  one  iteration  of  the  steady  state 
is  the  maximum  amount  of  memory  occupied  by  the  data  items  during  execution,  Memexec.  This  memory 
is  both  due  to  ILDs  and  LCDs.  The  memory  requirement  of  a  pipelined  design,  MemReq  is  then  given 
by: 

MemReq  =  max(MemLCD,  Memexec) 

As  we  see  by  the  above  discussion  MemLCD  is  a  lower  bound  on  the  memory  requirement  of  a  pipeline 
schedule.  During  retiming  we  convert  a  schedule  constraining  dependency  {ILD)  in  to  a  LCD  which  does 
not  constrain  the  schedule,  thereby  increasing  MemLCD ■  Therefore  during  retiming  we  should  try  to  reduce 
the  increase  in  MemLCD- 

Each  task  in  the  DDG  is  bound  to  a  unique  resource.  Hence  Res  Mil  is  an  achievable  lower  bound.  In  other 
words  we  should  be  able  to  schedule  the  DDG  in  Mil  time  when  the  binding  is  known  (and  RecMII  < 
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ResMII ).  The  general  case  where  binding  is  unknown  increases  the  complexity  of  the  scheduler.  However, 
the  retiming  heuristic  should  work  equally  well  in  the  general  case. 


4  RECOD:  Retiming  Heuristic  for  HW/SW  Codesigns 

We  do  retiming  when  we  are  unable  to  schedule  a  DDG  in  the  given  initiation  interval,  II.  A  successful 
schedule  for  a  DDG  can  be  obtained  by  decreasing  the  number  of  dependencies  that  constrain  the  schedule. 
By  retiming  we  can  transform  a  PSD  into  a  FSD  or  NSD  (LCDs)  by  increasing  the  dependence  distance 
(5).  LCDs  do  not  constrain  an  iteration  of  the  loop.  During  retiming  we  ensure  that  no  dependency 
has  S  <  0.  Also  retiming  should  produce  an  equivalent  DDG.  Two  graphs,  DDG  =  G(V,  E,  A,  5)  and 
DDG'  =  G(V,E,\',6')  are  equivalent  if,  V(u,u)  €  E,  the  following  equation  holds, 

A(t>)  -  A(tt)  +  J(u,«)  =  A»  -  A'(ti)  +  6'(u,v) 

Retiming  produces  a  DDG  with  tasks  belonging  to  different  iterations.  In  other  words  dependence  retiming 
helps  in  pipelining  a  DDG. 

The  drawback  of  retiming  is  that  it  increases  the  memory  requirement  of  the  schedule.  Since  we  now 
have  tasks  belonging  to  different  iterations  executing  at  the  same  time,  we  need  more  shared  memory  to 
store  data  between  successive  iterations  of  the  steady  state.  We  can  minimize  this  increase  by  using  good 
heuristics  to  select  the  dependency  to  be  retimed.  But  this  is  not  enough.  In  order  to  produce  an  equivalent 
D DG  other  dependencies  might  need  to  be  retimed.  The  increase  in  shaxed  memory  requirement  due  to 
these  dependencies  should  also  be  minimized.  Hence  RECOD  does  retiming  in  two  steps.  In  the  first  step 
it  heuristically  selects  a  dependency  to  be  retimed.  Increasing  the  distance  of  a  dependence  necessitates 
changing  the  A  and  S  of  other  tasks  and  dependencies.  Decreasing  the  S  of  a  dependence  is  likely  to  change 
it  in  to  a  PSD.  Hence  during  retiming  we  only  increase  the  distance  of  the  dependencies.  In  a  DDG  there 
might  exist  a  number  of  sets  of  dependencies  whose  distance  could  be  increased  to  obtain  an  equivalent 
retimed  DDG.  In  step  2  we  select  the  set  of  dependencies  which  on  retiming  result  in  the  least  increase  in 
shared  memory  requirement. 

The  distance  of  a  dependency  belonging  to  a  recurrence  in  the  DDG  cannot  be  increased  without  de¬ 
creasing  the  distance  of  any  other  dependency.  Hence  during  retiming  we  do  not  increase  the  distance 
of  a  dependence  belonging  to  a  recurrence.  A  dependence  not  belonging  to  a  recurrence  can  however  be 
retimed  without  decreasing  the  distance  of  another  dependence. 

4.1  RECOD  Step  1:  Heuristic  To  Select  A  Dependency  For  Retiming  Transformation 

As  a  first  step  towards  retiming  we  select  a  dependency  to  be  retimed.  The  priority  of  a  dependency  to  be 
retimed  depends  on  its  following  four  properties  in  decreasing  order: 
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1.  Dependency  is  a  PSD. 

The  primary  objective  of  RECOD  is  to  reduce  scheduling  constraints  in  the  DDG;  and  give  the 
scheduler  greater  freedom  in  scheduling  tasks  on  the  resources.  Only  PSDs  constrain  scheduling. 
Hence  the  dependency  to  be  retimed  should  be  a  PSD,  and  not  a  NSD  or  FSD. 

2.  Dependency  between  tasks  bound  to  heterogeneous  resources. 

As  mentioned  above  the  main  objective  of  the  retiming  heuristic  is  reduce  scheduling  constraints  in 
the  graph.  Increasing  the  distance  of  a  dependency  between  tasks  mapped  to  the  same  resource  does 
not  necessarily  help  the  scheduler.  Basically  the  two  tasks  have  to  be  scheduled  on  the  same  resource 
and  will  be  scheduled  one  after  the  other.  On  the  other  hand  retiming  a  dependency  between  tasks 
mapped  to  different  resources  definitely  gives  more  freedom  to  the  scheduler. 

3.  Dependency  whose  predecessor  task  belongs  to  a  longer  positive  path. 

As  discussed  in  the  previous  section  the  positive  paths  limit  the  II  of  a  pipeline  schedule.  Increasing 
the  distance  of  a  dependency  whose  predecessor  task  belongs  to  a  longer  positive  path  helps  in 
'  obtaining  a  pipeline  schedule  with  smaller  II  and  therefore  higher  throughput. 

4.  Dependency  representing  the  least  number  of  variables  transferred. 

A  secondary  objective  of  retiming  transformation  is  to  minimize  the  increase  in  memory  requirement 
of  the  DDG.  Increasing  the  distance  of  a  dependency  with  more  variables  definitely  results  in  a  larger 
increase  in  memory  requirement.  Hence  we  select  a  dependency  representing  fewer  variables  being 
transferred. 

We  use  property  1  to  select  dependencies  to  be  retimed,  and  use  properties  2  ,  3  and  4  (in  that  order)  to 
break  ties. 

4.2  RECOD  Step  2:  Partitioning  To  Minimize  Increase  In  Memory  Requirement 
During  Retiming 

The  primary  objective  of  retiming  is  to  give  the  scheduler  greater  freedom.  This  is  achieved  by  the 
heuristic  described  above.  We  now  select  the  set  of  dependencies  which  give  us  the  least  increase  in 
memory  requirement.  Given  a  dependency  e  =  ( u ,  v)  to  be  retimed  we  define  the  following  four  sets  with 
respect  to  u : 

Vc  =  {connected  component  to  which  u  belongs  } 

P  —  {v  £  Vc|  there  is  a  path  from  v  to  u  }  U  {u} 

S  =  {v  £  Vc\  there  is  a  path  from  u  to  v  } 

R  =  Vc-{PuS) 

Figure  4  gives  an  illustration  of  the  four  sets.  We  can  retime  the  dependency  e  =  [u.  v )  by  the  following 
three  equations. 

A  («)  =  A(u)  +  1 
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Figure  4:  P,  S  and  R  sets  during  retiming  of  dependency  (u,v) 


S(u,x )  =  S(u,x)  +  l,Vx  £  V  such  that  (u,  x)  £  E 
S(x,u)  =  6(x,u)  —  l,Vx  £  V  such  that  (x,  u)  £  E 

Application  of  the  three  equations  would  result  in  an  equivalent  DDG.  However  the  third  equation  decreases 
the  distance  of  some  dependencies.  This  can  be  avoided  by  increasing  the  A  of  all  tasks  which  are  in  P,  that 

is  Vu  £  P,  X(u)  =  \(u)  + 1.  Now  to  obtain  an  equivalent  DDG  we  need  to  increase  the  5  of  all  dependencies 

whose  predecessor  task  is  in  the  set  P,  but  successor  isn’t,  that  is  £  E,u  £  P,v  £  P,S(u,v)  = 

6(u,v)  +  1.  This  is  the  cutset  cl  in  Figure  4.  Another  way  to  retime  without  decreasing  the  8  of  any 
dependence  is  as  follows,  Vu  G  {PUR},  \(u)  =  A (u)  + 1  and  V(u,  v)  £  E,u  0  S,v  €  S,8(u,v )  =  8(u,v )  + 1. 
This  is  the  cutset  c2  in  Figure  4.  However  it  is  possible  that  neither  cutset  cl  nor  c2  might  give  us  a 
minimum  increase  in  memory.  We  could  obtain  another  cutset  c3  (see  Figure  4)  by  partitioning  the  set  R 
into  P  and  S,  so  that  the  memory  increase  is  minimized.  We  use  a  simulated  annealing  based  partitioned 
The  cost  function  being  minimized  is  defined  as  follows.  For  a  cut  Cj  =  {ei,  e2, . . . ,  en},  the  cutsize  cost  is 
given  by  : 

71 

Cost  =  y>ar(e3-) 

3= 1 

var(ej)  is  the  number  of  variables  across  the  dependency  e3.  In  the  above  cost  function  the  sum  gives  us 
the  extra  memory  required  by  the  LCDs  after  retiming.  During  partitioning  we  ensure  that  if  a  task  u  is 
in  partition  P  (5)  then  all  its  predecessors  (successors)  are  also  in  partition  P  (S').  After  partitioning  set 
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Algorithm  RECOD:  Retimes  the  DDG 
Input  :  DDG 

Output  :  Retimed  DDG  with  less  number  of  PSDs 

Begin 

DDGno-acc  =  remove  jscc(D  DG) 
edge(UfV)  —  heuristic  ~select(D  DGnoscc) 
if  (edge(UyV)  =  0^  then  retur n(D  DG  Jailure) 

Vc  —  {connected  component  to  which  u  belongs} 

S  =  €  Vc\ there  is  a  path  from  u  to  t>} 

P  =  {v  €  Ve\ there  is  a  path  from  v  to  u}  U  {«} 

R  =  VC-{SUP} 
partition(R,P,S) 
for  each  x  €  Vc 

if  (x  €  P)  then  A(x)  =  A(x)  -f  1  endif 
endfor 

for  each  (x,y)  6  Ec 

if  (x  €  P  AND  y  €  S)  then  8(x,y)  =  8(x,y)  +  1  endif 
endfor 

copy~changes(D  DGnoscc ,  DDG) 
ret  urn  (D  D  G,  success) 

end 


Figure  5:  RECOD:  Algorithm 

R  in  to  sets  P  and  S  we  do  retiming  using  the  following  two  equations: 

Vu  e  p,  x(u)  =  x(u)  + 1 

V(u,v)  e  E,u  e  P,v  P,S(u,v)  =  S(u,v)  +  1 

4.3  RECOD:  Algorithm 

The  algorithm  to  do  retiming  transformation  is  shown  in  figure  5.  A  brief  explanation  of  the  functions  used 
in  the  algorithm  are  as  follows.  The  function  remove.scc()  replaces  every  strongly  connected  component, 
sect  (or  recurrence)  in  the  DDG  with  a  single  task  usccj-  It  returns  a  new  graph  DDGn0^cc.  All  the 
dependencies  that  axe  paxt  of  a  recurrence  see*  axe  not  present  in  DDGn0^cc-  All  the  dependencies  that 
are  “to”  and  “from”  any  task  in  the  sect  axe  now  from  the  single  task  usccj.  We  use  DDGno^Cc  for 
retiming.  By  removing  all  the  see  tasks  and  dependencies  we  ensure  that  no  dependency  belonging  to  a 
recurrence  is  retimed;  although  the  A  of  all  the  tasks  belonging  to  a  recurrence  might  be  increased.  The 
changes  axe  reflected  in  the  original  DDG  by  the  function  copy.changes().  The  function  heuri$tic.select() 
heuristically  selects  a  dependency  to  be  retimed  (see  section  5.1).  The  function  partitionf)  as  the  name 
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Figure  6:  DDG  for  JPEG  like  Compression  Algorithm 


id. 

Task 

SW  time(ns) 

HW  time(ns) 

1 

FDCT 

371300 

8400 

2 

Quant. 

7560 

600 

3 

ZigZag 

1630 

400 

4 

RLE  &  Huff. 

18480 

884000 

Table  1:  SW  and  HW  run  times  for  various  JPEG  tasks 


suggests  partitions  R  between  P  and  S  (see  section  5.2).  The  two  for-loops  do  the  retiming.  The  first  one 
increases  the  A  of  all  tasks  u  E  P.  The  second  one  increases  the  5  of  all  dependencies  (u,v),u  €  P,v  E  S. 


5  Experimental  Results 

To  demonstrate  the  effectiveness  of  the  retiming  heuristic  in  HW/SW  codesign,  we  consider  the  design  of 
a  JPEG  [14]  like  compression  algorithm.  The  DDG  of  the  specification  is  shown  in  Figure  6.  It  consists 
of  four  tasks,  Forward  Discrete  Cosine  Transform  (FDCT),  Quantization,  Zig-Zag  and  RLE  and  Huffman 
encoding.  All  the  dependencies  have  <5  =  0  and  the  number  of  variables  transfered  across  each  dependency 
is  16.  The  respective  run  times  of  the  various  tasks  in  SW  and  HW  is  shown  in  Table  1  [19].  Table  2 
shows  the  estimated  throughput  and  memory  requirements  for  various  bindings  of  the  tasks.  Columns 
two  to  five  give  the  bindings  of  the  tasks.  The  sixth  and  seventh  columns  have  the  run  time  and  memory 
requirement  of  the  non-pipeline  design  of  the  application.  The  eighth  column  gives  the  Mil  of  the  pipeline 
implementation.  Columns  nine  and  ten  give  the  achieved  II  and  the  memory  requirement  of  the  pipeline 
implementation.  The  speed-up  and  increase  in  memory  requirement  due  to  pipeline  implementation  are  in 
columns  eleven  and  twelve  respectively.  In  the  table  we  have  exhaustively  bound  all  the  tasks  to  SW  and 
HW.  Since  we  have  four  tasks,  we  have  sixteen  rows  in  the  graph.  The  results  show  that  we  were  always 
able  to  schedule  the  DDG  in  Mil  time.  We  can  achieve  a  speed-up  of  upto  1.6  (row  15).  The  maximum 
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Abstract 

Hardware! software  CoSynthesis  is  a  complex 
process  that  involves  transforming  a  high-level  system 
specification  to  an  implemented  hardware! software 
system  that  meets  the  specification  constraints.  One 
phase  of  the  CoSynthesis  process  is  described  here: 
partitioning  the  specification  into  components  and 
binding  them  to  hardware! software  resources. 
Partitioning  requires  an  effective  means  to  explore  the 
design  space ;  challenges  include  (1)  supporting 
constraint-driven  retrieval  and  (2)  evaluating  candidate 
solutions  considering  the  interaction  of  multiple 
constraints .  The  CoSynthesis  Tool  described  here 
assigns  scores  to  candidate  solutions  using  multiple 
design  constraints,  but  rather  than  the  simple  sum 
approach  predominant  in  CoSynthesis  research ,  it  uses  a 
vector  of  rank  data  that  does  not  require  that  equal 
weight  be  given  to  all  criteria.  Our  results  to  date  show 
that  not  only  can  we  can  process  a  scaleable ,  selectable 
set  of  design  constraints ,  but  when  compared  with  a  2 
constraint  Fidducia-Matheyses  (FM)  approach ,  we 
achieve  better  results .  The  flexible  component  retrieval 
is  accomplished  using  our  database  system;  the  database 
is  unique  for  three  reasons:  (1)  it  uses  a  hardware 
description  language  as  the  basis  for  its  conceptual 
model ,  (2)  it  allows  flexible ,  ad  hoc  querying  over 
designs,  and  ( 3 )  it  uses  a  fine  granularity  of  component 
modeling  to  enable  detailed  search  conditions  required  by 
the  CoSynthesis  Tool. 


1.  Introduction 

Hardware/software  CoDesign  and  CoSynthesis 
can  be  characterized  as  a  binding  problem:  binding 
components  from  a  database  to  functional  specifications 
in  order  to  create  a  hardware/software  system  that  carries 
out  the  desired  functionality  and  meets  performance 
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constraints.  The  CoDesign  methodology  used  in  our 
research  is  embodied  in  the  hardware/software  CoDesign 
and  CoSynthesis  project  called  COMET  [Vem94].  The 
general  goal  of  COMET  is  to  transform  high  level 
system  specifications  into  application  specific  electronic 
signal  processing  modules  using  a  hardware/software 
CoSynthesis  process  and  to  produce  working  hardware 
within  a  two  week  time  period.  HW/SW 
CoDesign/CoSynthesis  is  assumed  to  be  the  requisite 
approach  for  reducing  the  development  cycle  [Gaj94] 
and  time  to  market.  Current  time  to  market  for  a 
complex  HW/SW  system  is  approximately  18  months 
[Keu94], 

An  abstract  representation  of  the  major 
COMET  system  components  is  given  in  Figure  1.  A 
user  supplies  a  system  specification  that  is  divided  into 
modules,  matched  to  component  specifications,  and  then 
allocated  to  either  hardware  or  software  synthesis 
processes.  The  CoSynthesis  process  is  iterative; 
alternate  bindings  are  used  to  satisfy  constraints  such  as 
performance  and  area  requirements.  The  CoSynthesis 
Tool  issues  requests  to  the  design  database  using 
qualifications  on  design  properties,  and  the  query 
processor  determines  the  set  of  design  objects  that 
subsume  the  request  In  other  words,  a  query  is  a 
module  description,  and  any  modules  in  the  database 
that  have  at  least  the  desired  functionality  (possibly 
additional  functionality)  are  returned.  The  CoSynthesis 
Tool  analyzes  candidate  solutions  and  determines  the 
best  assignment  of  resources  to  hardware  and  software 
using  an  iterative  binding  algorithm.  The  hardware  and 
software  specifications  are  processed  by  hardware  and 
software  synthesis  tools,  then  integrated  to  form  a 
system  that  satisfies  the  initial  specifications.  The  end 
result  of  these  transformations  is  an  application  specific 
hardware  design  that  can  be  fabricated  along  with  the 
embedded  software  that  will  be  executed  on  the 
manufactured  hardware.  The  shaded  portion  of  Figure  1 
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highlights  the  subsystems  described  in  this  paper,  the 
CoSynthesis  Tool  and  the  design  database. 


Figure  1.  Application  Environment. 

2.  The  CoSynthesis  Tool 

The  goal  of  the  HW/SW  CoSynthesis  tool  is 
to  allocate  hardware  and  software  resources  for  the 
modules  given  in  a  high-level  system  specification. 
Input  to  the  CoSynthesis  Tool  specifies  the  system 
functionality  and  performance  constraints  levied  on  the 
system  by  the  designer.  They  can  specify  (but  are  not 
limited  to)  the  final  design's  size,  weight,  power 
consumption,  heat  dissipation,  and  speed.  The  output 
of  the  CoSynthesis  tool  consists  of  bindings  of 
modules  to  resources.  The  resources  come  from  a  pre¬ 
defined  component  database.  It  is  the  interplay  of  their 
attributes  (size,  weight,  power,  etc.)  with  particular 
bindings  of  resources  to  actions  that  determines  how 
well  the  final  design  meets  the  performance  constraints 
[Mil95].  In  this  paper,  our  preliminary  implementation 
produces  a  VHDL  configuration  body  as  output. 
VHDL  uses  configuration  bodies  to  specify  bindings 
between  components  within  a  design  and  their 
implementation  in  a  VHDL  library  of  components. 
Extensions  to  this  research  have  the  goal  of  producing  a 
configuration  body  and  an  updated  architecture  reflecting 
hardware  and  software  resource  allocations. 

The  relationship  of  our  CoSynthesis  algorithm 
and  algorithms  used  for  traditional  hardware  partitioning 
is  described  in  Section  2.1.  Our  algorithm  is  proposed 
in  Section  2.2. 

2.1  Related  Work 

Iterative  techniques  such  as  Simulated 
Annealing  (SA).  Kemighan-Lin  (KL),  Fiducda- 
Mattheyses  (FM),  and  Genetic  Algorithms  (GA)  are 


commonly  used  in  hardware  partitioning  [She 94]  and 
have  been  in  use  for  a  decade  or  more  [Bha94]. 
Hardware  partitioning  provides  a  means  for  breaking  a 
system  design  up  into  smaller,  more  manageable  pieces 
based  primarily  on  the  number  of  communication 
channels  between  the  pieces.  Hardware  partitioning  is 
not  limited  to  one  level  of  design  abstraction  or  even 
application  area.  It  can  be  used  to  facilitate  design 
packaging  [Bha94],  design  layout  [Bha94],  simulation 
and  test  [Cha94],  Rapid  Prototyping  [Cha94],  and  logic 
minimization  [Con94], 

Given  an  initial  partitioning  of  a  system  into 
two  halves,  iterative  techniques  move  one  circuit 
component  (node),  or  pairs  of  nodes,  between  the 
partitions  in  an  effort  to  minimize  a  single  constraint  or 
a  pair  of  constraints.  At  the  core  of  these  algorithms  is 
die  manner  in  which  they  select  the  "best  node”  within 
the  system  graph  to  move  between  partitions.  These 
techniques  are  a  natural  extension  for  HW/SW 
CoSynthesis  and  are  the  c<xe  iterative  technique  of 
many  CoDesign  or  CoSynthesis  approaches  [Ben93] 
[Cai96]  [Gaj94]  [Gup93]  [Hen96]  [Yeh95]. 

In  the  HW/SW  CoSynthesis  context,  the 
hardware  partitions  become  software  and  hardware 
partitions  respectively.  The  movement  of  system  nodes 
between  the  two  is  accomplished  by  rebinding  the 
node's  physical  implementation  from  hardware  to 
software  or  vice-versa.  However,  while  cutset 
minimization  remains  a  meaningful  design  constraint, 
area  balancing  does  not.  Further,  one  of  the  COMET 
project's  goals  is  to  facilitate  additional  design 
constraints  in  the  CoSynthesis  process.  The  iterative 
improvement  algorithms  are  limited  by  their  ability  to 
readily  add  additional  design  constraints  due  to  their 
manner  of  selecting  the  "best  node”  to  move  between 
partitions. 

The  two  most  common  hardware  partitioning 
algorithms  differ  in  how  they  select  the  "best  node"  to 
move.  The  Fidducia-Matheyses  method  (FM)  [She94] 
for  hardware  partitioning  starts  from  an  initial 
partitioning  of  the  system  graph.  It  proceeds  by  rank 
ordering  all  the  tasks  in  the  graph  based  on  how  moving 
a  task  from  one  chip  to  the  other  impacts  the  overall 
inter-chip  communication  (cutset).  Next,  the  rank 
ordered  list  is  stepped  through  and  the  algorithm  selects 
the  first  task  from  the  list  that  reduces  the  cutset  and 
does  not  violate  a  predetermined  size  balance  (usually 
set  at  40-60%)  between  the  two  chips.  This  task  is  then 
moved  to  the  other  partition  and  the  ranked  list  is 
updated.  This  process  repeats  until  all  tasks  have  been 
moved.  The  history  of  all  task  moves  is  examined  to 
find  the  point  in  the  process  where  the  cutset  is 
minimized. 
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The  Ratio  Cut  method  [Wei91]  [Cha94] 
evaluates  the  tasks  based  on  the  following  equation 

c 

where  A  and  A*  are  the  two  hardware  partitions  and  C  is 
the  cutset  between  the  partitions.  This  equation  takes 
into  account  the  number  of  communication  lines  and 
the  relative  sizes  of  the  two  partitions.  Once  all  the 
tasks  have  been  evaluated,  the  task  with  the  smallest 
value,  R,  is  selected  for  movement 

The  FM  method  may  be  extended  for 
additional  constraints,  but  either  each  constraint  must  be 
expressed  as  a  range  or  the  task  ranking  must  be  based 
on  an  equation  that  incorporates  the  results  of  the 
constraint  evaluations  as  a  simple  sum.  The  first 
method  is  imprecise;  the  second  mixes  incomparable 
attributes.  The  Ratio  Cut  method  suffers  from  the  same 
restrictions. 

Our  algorithm  improves  on  the  iterative 
improvement  technique  by  selecting  the  “best  node”  for 
rebinding  rather  than  the  first  node  that  is  acceptable,  as 
well  as  allowing  additional  constraints  to  be  added  easily 
to  the  evaluation  process.  Our  work  is  primarily 
influenced  by  techniques  from  hardware  partitioning,  but 
we  have  taken  an  approach  similar  to  that  of  the 
DESTINATION  project  [Mai96]  for  assigning  tasks  to 
processors  in  complex  computer  systems.  They 
consider  multiple  constraints  with  user-defined  weights 
combined  into  a  single  objective  function,  similar  to 
our  approach. 

2.2  CoSynthesis  Algorithm 

The  new  algorithm,  called  SCOREBOARD, 
has  its  roots  in  the  FM  method.  Our  algorithm 
maintains  separate,  rank-ordered  lists  for  each  node  that 
may  be  rebound  for  each  constraint  specified  by  the 
system  specification.  Each  constraint  specifies  the 
scalar  value  of  one  dimension  of  a  ranking  vector  for 
that  node.  The  “best  node”  to  move  is  selected  by 
choosing  the  node  with  the  smallest  vector  from  the  set 
of  possible  candidates  to  rebind.  After  preliminary 
system  definitions  in  Sections  2.2.1  and  2.2.2,  the 
algorithm  is  described  in  Section  2.2.3. 

2.2.1  Component  Database 

During  CoSynthesis.  all  nodes  from  the 
system  are  bound  to  a  specific  implementation  from  a 
database  or  library  of  hardware  and  software 
components.  The  component  library,  L,  consists  of 
components,  ljj^,  where: 


j  specifies  the  class  or  functionality  of  the  library 
component,  and 

k  specifies  the  particular  implementation  for  the 
component. 

Using  VHDL  as  the  design  language,  VHDL 
entity/architecture  pairs  represent  the  j’s  and  kfs. 
Additionally,  for  each  lj^  component  there  exists  a  set 
of  performance  attributes,  pi,  and  a  set  of  functions,  fj. 
Sample  pi’s  include  size,  cost,  weight,  and  area. 
Further,  for  a  given  j,  all  lj,k  components  implement 
the  same  function,  fj.  The  task  of  the  CoSynthesis 
Tool  is  to  bind  components  from  the  library  to  nodes 
within  the  system  such  that  the  functions  (fj)  of  a 
bound  component  (ljjc)  match  those  of  the  node  in  the 
system,  and  the  aggregate  system  performance  attributes 
satisfy  the  system-level  constraints  levied  by  the 
designer.  The  data  model  and  flexible  retrieval 
mechanism  are  further  described  in  Section  3. 

2.2.2  System  Definition 

The  input  to  the  HW/SW  CoSynthesis  tool. 
Sin*  is  defined  as  a  triple  (G,  C,  B),  where: 

G  is  a  dataflow  hypergraph,  denoted  (V,  E)  where 
V  is  the  set  of  ail  nodes,  v£,  of  the  graph  G, 

E  is  the  set  of  all  edges,  denoted  as  {(vi,  IQ}, 
where  K  is  a  subset  of  V. 

C  is  a  set  of  performance  constraints,  q,  that 
specify  S's  performance  constraints.  (Sample  ci 
are  area,  weight,  power  consumption,  and  time 
delay.) 

B  is  a  binding  set  in  which  a  binding,  denoted  (vi, 
lj  Jc)*  associates  one  vi  €  V  to  one  and  only  one 
ijjc  €  L.  Initially,  B  can  be  either  the  empty  set 
or  a  user-specified  set  of  bindings. 

Output  from  the  HW/SW  CoSynthesis  tool,  Sout*  is 
defined  similarly  to  S*m.  The  output  system  is  a  triple, 
(G,  A,  B),  where  G  and  B  are  defined  as  above  and 

A  is  a  set  of  system  performance  attributes.  Each 
ai  €  A  is  calculated  by  a  specific  constraint 
analyzer  in  the  SCOREBOARD  tool  and  is 
based  either  on  the  performance  attributes,  pi, 
associated  with  components  of  the  binding  set, 
B,  and  their  satisfaction  of  the  set  of 
performance  constraints,  C  €  S^. 

Associated  with  the  constraints  of  the  input  system,  C, 
and  the  attributes  of  the  output  system.  A,  is  a 
constraint  satisfaction  function  X(q,  ai).  This  function 
determines  whether  or  not  the  attribute  ai  of  the  output 
system  achieves  the  desired  goal  set  by  the  input  ci.  An 
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example  is  area;  X(ci,  ai)  compares  the  output  system’s 
area  (a  simple  sum  of  the  area  of  the  bound 
components)  with  the  designer’s  input  area  constraint 
The  goal  of  HW/SW  CoSynthesis  is  then 
Given:  Sin  =  (G,  Cs,  B),  where  B  is  initially  either 
the  empty  set  or  a  user-specified  set  of  bindings. 
Create:  S0ut  “  which 

V  i,  vi  €  V.  3  a  binding  (vi.  lj,k  )  of  vi  to  a 

specific  lj  jc  €  L  such  that 

V  i,  cj  €  of  C,  and  ai  €  A,  the  constraint 

satisfaction  function  X(cj,  ai )  is  satisfied. 

2.2.3  Algorithm 

Our  approach  improves  on  the  iterative  partitioning 
technique  by  incorporating  a  three  step  evaluation 
process  for  selecting  the  "best  node"  to  move  based  on 
user  supplied  constraints.  Prior  to  algorithm  execution, 
the  nodes  of  the  system  are  initially  bound  to  an 
implementation  (hardware  or  software)  from  the 
component  library.  All  nodes  in  the  graph  are 
unlocked.  The  algorithm,  outlined  in  Figure  2,  proceeds 
as  follows.  Each  constraint  maintains  a  separate  rank 
ordered  list.  During  the  first  step,  denoted  by  [1]  in 
Figure  2,  system  nodes  are  inserted  into  each 
constraint's  ordered  list  based  on  the  impact  of  the 
node's  potential  movement  (rebinding)  on  the  overall 
system.  From  the  context  of  the  node's  score  in  these 
ordered  lists,  constraint  ranks  are  assigned  to  the  nodes 
during  step  [2];  these  constraint  ranks  are  the  scalar 
values  for  the  node's  rebinding  vector.  Finally,  in  step 
[3],  the  rebinding  vectors  for  the  nodes  are  examined  and 
the  node  with  the  shortest  vector  (Euclidean  norm)  is 
selected  for  rebinding.  The  node  is  bound  to  the 
alternate  implementation  and  locked,  and  the  three  steps 
are  repeated  until  no  further  node  rebindings  are 
possible. 


While  (ULTasks  *  $)  1 
FOR  EACH(CA)  { 

[  1  ]  CA->Score  (ULTasks) ; 

[  2  ]  CA->Rank  (ULTasks ) ;  > 

C  3  3  Task2 Rebind  =  SVector (ULTasks ) ; 

Rebind (  Task2Rebind  ); 

LTasks  =  LTasksu  Task2Rebind; 

ULTasks  =  ULTasks  -  Task2Rebind;  ) 
Where  ULTasks  =  Unlocked  Tasks 
CA  =  Constraint  Analyzer 
LTasks  =  Locked  Tasks 
Svector  =  ShortestVector  routine 

Figure  2.  SCOREBOARD  Algorithm. 


The  components  under  consideration  for 
rebinding  are  initially  retrieved  from  the  database  using 
the  constraints  as  part  of  a  criteria-based  search  (a 
query).  Traditionally,  each  VHDL-based  tool  must 
contain  its  own  parser  and  mechanism  for  searching 
VHDL  design  units.  Our  approach  is  to  use  a  design 
database  and  query  language  facilities  rather  than 
incorporating  this  functionality  in  each  tool  within  the 
COMET  environment 

3.  The  Design  Database 

Many  of  the  tools  in  the  COMET 
environment,  such  as  tools  for  partitioning,  synthesis, 
and  performance  estimation,  as  well  as  in  industrial 
design  environments,  are  VHDL-based.  The  general 
goals  of  our  design  database  are  (1)  that  it  should 
"understand"  VHDL,  and  (2)  allow  flexible  retrieval  of 
components  specified  in  VHDL.  We  accomplish  these 
goals  by  defining  a  conceptual  data  model  that  is 
implemented  in  our  database  system  Odyssey  [Ven95] 
[Ven96a].  VHDL  can  be  used  as  input  or  obtained  as 
output  from  the  database,  in  addition  to  accessing  data 
through  other  interfaces.  We  define,  a  general  query 
language  that  provides  an  interactive,  stand-alone 
interface,  or  can  be  used  by  tools  to  retrieve  designs,  hi 
this  way,  we  can  interface  with  existing  tools  and 
additionally  allow  greater  flexibility  for  browsing  and 
retrieving  components  from  design  libraries.  Users  of 
die  database  gain  query  and  view  facilities  as  well  as 
more  flexible  storage  management  than  with  traditional 
file-based  VHDL  environments. 

Others  have  developed  specialized  databases  for 
VLSI  CAD  [Sie89][Kim90][Nay91][Wag92],  however, 
our  research  is  the  first  that  we  are  aware  of  to  use  a 
hardware  description  language  as  a  database  description 
language.  Wagner  examines  some  of  the  issues  in  using 
HDLs  for  database  description  [Wag95],  but  models 
designs  at  a  coarser  granularity.  Modeling  at  a  finer 
level  of  granularity  permits  queries  on  information 
regarding  entity  ports  that  may  be  of  prime  interest  in 
the  CoSynthesis  process.  For  example,  numerical 
accuracy  may  be  an  additional  constraint  imposed  by  the 
system  specification;  during  system  CoSynthesis, 
tradeoffs  can  be  made  to  achieve  a  particular  system 
numerical  accuracy  based  on  the  bus  widths  of  the 
components  used  in  the  system. 

Our  approach  to  design  data  modeling  and 
retrieval  is  to  parse  and  store  VHDL  source  using  our 
conceptual  model.  The  components  can  be  directly 
accessed  through  a  query  interface,  either  by  designers  or 
tools.  The  instances  can  also  be  restored  to  VHDL  so 
that  legacy  tools  may  access  designs  placed  in  the 
database  regardless  of  their  source. 
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Vc=O.Oi  +1.0j  |fc|  =  1.0 
V*  =  0.583/  +  0.0 j  \VR  |  =  0.583 
1.0? +0.866 j  |^|  =  1.322 

For  this  example,  the  reverser  has  the  smallest 
rebinding  vector  and  is  the  best  candidate  to  rebind  for 
this  iteration  of  the  algorithm.  It  is  rebound  and  locked 
(eliminating  it  from  consideration  in  the  future). 
Finally,  new  system  attribute  values  are  calculated 
(Figure  8)  and  the  algorithm  repeats  until  all  nodes  have 
been  rebound. 
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Figure  8.  System  Attributes  after  Rebinding. 

If  the  system  constraints  have  not  been  met,  the  best 
solution  achieved  by  the  algorithm  can  be  used  as  the 
initial  bindings  and  the  algorithm  re-executed. 

5.  Results  and  Analysis 

An  object-oriented  experimental 

SCOREBOARD  system  has  been  prototyped  using 
C++  that  accepts  a  VHDL  entity/architecture  pair  and  a 
constraint  description.  The  VHDL  input  describes  the 
system  as  a  netlist  of  instantiated  components  while  the 
constraint  description  indicates  which  constraint 
analyzers  and  goals  to  include  in  the  SCOREBOARD 
algorithm.  Although  instantiated  components  are  a 
subset  of  the  possible  VHDL  language  constructs  that 
can  be  used  to  model  systems,  our  approach  is 
extensible  to  allow  us  to  model  any  concurrent  VHDL 
task  (processes,  blocks,  concurrent  signal  assignments, 
procedure  calls,  etc.).  Currently  six  primitive 
constraints  are  supported:  cutset  minimization,  cutset 
maximum  value,  area  minimization,  area  maximum 
value,  cost  minimization,  and  cost  maximum  value. 
The  "minimization"  constraint  analyzers  attempt  to 
minimize  their  particular  system  attribute:  the 
"maximum  value"  analyzers  attempt  to  minimize  a 
system  attribute  until  a  maximum  possible  value  is 
achieved.  Inheritance  from  a  common  constraint 
analyzer  base  class  facilitates  the  creation  and 
manipulation  of  additional  analyzers  within  the 
SCOREBOARD  system.  The  output  is  a  revised 
VHDL  architecture  dividing  the  system  into  hardware 
and  software  components  and  a  VHDL  configuration 


body  binding  the  instantiated  components  to  library 
elements.  Experimental  data  has  shown  this  algorithm 
produces  better  two-constraint  designs  than  existing 
iterative  improvement  methods.  Further  the  algorithm's 
complexity  is  similar  to  existing  hardware  partitioning 
techniques  [She94],  namely  Ofc2),  where  n  is  the 
number  of  nodes  in  the  system. 

The  following  two  examples  depict  the 
attributes  of  a  synthesized  system  as  the 
SCOREBOARD  algorithm  iterates  to  completion. 
Each  example  was  generated  from  the  same  input 
system,  an  ISCAS  85  benchmark  [ISC85],  consisting 
of  1350  nodes.  In  the  first  example,  the 
SCOREBOARD  algorithm  had  three  goals:  minimize 
the  system  cutset,  minimize  the  system  area,  and 
balance  the  respective  sizes  of  the  HW  and  SW 
partitions.  In  practice,  the  third  goal  is  of  little  value  in 
a  HW/SW  CoSynthesis  environment  It  is  included 
here  to  depict  a  3-constraint  example  and  as  a  further 
indication  of  the  capability  of  the  algorithm  over  other 
partitioning  methods.  The  first  two  constraints,  cutset 
and  area  minimization,  are  plotted  in  Figure  9.  The  x- 
axis  shows  a  history  of  the  iterative  rebindings  for 
cutset  and  area.  Each  step  along  the  x-axis  is  one 
iteration  of  the  algorithm.  If  the  constraints  of  interest 
are  cutset  and  area,  then  the  optimal  point  is 
approximately  around  700.  Figure  10  shows  the 
history  of  rebindings  with  respect  to  area  balance 
between  hardware  and  software  as  well  as  total  area. 
Although  this  consideration  is  artificial  in  CoSynthesis, 
it  does  demonstrate  how  a  third  constraint  can  easily  be 
accommodated  in  our  approach.  The  balance  constraint, 
as  a  percentage  of  each  partition's  contribution  to  the 
whole,  is  in  figure  10. 


Figure  9.  SCOREBOARD  Cutset  and  Area. 


Iterations 


Figure  10.  SCOREBOARD  Area  Balance. 

In  the  second  example,  a  fourth  constraint  cost 
minimization,  was  added  to  the  analysis  of  the  same 
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system  to  illustrate  the  algorithm's  scalability.  This 
constraint  adds  another  dimension  to  the  rebinding 
vector.  Results  are  presented  in  Figure  11,  Figure  12. 
and  Figure  13.  It  is  apparent  by  examining  minimum 
values  achieved  for  cutset  and  area  balance  in  example  2 
that  a  rebinding  that  was  appropriate  in  the  first 
example  is  no  longer  suitable  in  the  second  when  the 
additional  constraint  is  considered. 


interfaces  with  legacy  tools  (VHDL  as  file  input/output) 
and  new  state-of-the-art  EDA  tools  (e.g.,  CoDesign  and 
CoSynthesis  tools)  to  allow  design  space  exploration 
via  criteria-based  searching.  The  contribution  is  that 
tools  do  not  have  to  be  scanners,  parsers,  and  query 
evaluators;  designers  and  tools  can  continue  to  work 
with  a  widely-used  modeling  language,  and  reap  the 
benefits  of  flexible  retrieval. 


Figure  11.  SCOREBOARD  Cutset  and  Area. 


Figure  12.  SCOREBOARD  Cost  and  Cutset. 


Figure  13.  SCOREBOARD  Area  Balance. 


6.  Conclusions  and  Future  Work 

Conclusions  and  issues  for  future  work  are 
discussed  below. 

6.1  Conclusions 

The  CoSynthesis  Tool  analyzes  candidate 
solutions  and  determines  the  best  assignment  of 
resources  to  hardware  and  software  using  an  iterative 
binding  algorithm  Our  algorithm  maintains  separate, 
rank-ordered  constraint  lists  of  system  nodes  that  may 
be  rebound  for  each  constraint  in  the  system 
specification.  Our  CoSynthesis  tool  improves  on 
hardware  partitioning  techniques  by  selecting  the  best 
node  for  rebinding  based  on  its  rebinding  vector  rather 
than  the  first  node  that  is  acceptable  and  allowing 
additional  constraints  to  be  added  easily  to  the 
evaluation  process. 

We  have  proposed  and  implemented  a  data 
model  that  stores  designs  described  in  VHDL  and 


6.2  Future  Work 

Future  research  will  cover  a  broad  range  of 
both  SCOREBOARD  and  database  refinements.  Near- 
term  efforts  will  formally  define  and  characterize  the 
SCOREBOARD  algorithm  and  an  analysis  of  the 
quality  of  the  synthesized  design.  This  includes  the 
evaluation  of  more  realistic  constraint  analyzers  and 
their  impact  both  on  the  design  process  and  the 
algorithm.  Allowing  user-defined  constraint  weighting 
to  the  scalar  values  of  the  rebinding  vector  is  an 
interesting  capability.  Additionally,  the  output  format 
will  be  refined  such  that  the  output  will  include  a 
revised  VHDL  architecture  containing  instantiated 
components  representing  the  hardware  and  software 
partitions.  The  software  partitions  would  be  represented 
as  instantiated  CPUs  and  memory  executing  the 
software. 

Further  research  could  address  the  granularity  of 
HW/SW  CoSynthesis  by  treating  sequential  statements 
of  VHDL  processes  as  individual  nodes.  Designs  that 
define  a  system’s  functionality  at  a  more  abstract, 
algorithmic  level  are  not  supported  in  the  current 
version  of  the  algorithm’s  implementation.  Finally, 
scheduling  and  resource  sharing  would  greatly  aid  the 
HW/SW  CoSynthesis  effort  in  that  duplicate  tasks 
would  not  be  replicated  in  the  system  design. 

Areas  for  future  database  research  include 
investigation  of  query  optimization  and  data  integration. 
Data  sharing  is  facilitated  since  different 
producers/consumers  of  design  data  can  use  the  common 
database.  Data  exchange  and  integration  can  also  be 
facilitated  for  other  EDA  data  formats  and  languages. 
We  have  investigated  interchange  issues  for  VHDL  arri 
the  CAD  Framework  Initiative  Design  Representation 
model  [Ven96b].  Formats  such  as  SDF  [SDF95]  for 
timing  delay  information  pose  additional  challenges  in 
this  area  [Dav96]. 

7.  References. 

[Ben93]  T.  Benner,  R.  Ernst,  and  J.  Henkel.  "Hardware- 
Software  Cosynthesis  for  Microcontrollers,”  IEEE  Design 
and  Test,  Vol.  10,  No.  4,  December  1993. 

[Bha94]  D.  Bhatia,  Physical  Design  Automation  Course 
Notes.  University  of  Cincinnati,  1994. 


77 


[Car96]  C.  Carreras,  J.  Lopez,  M.  Lopez,  C.  Delgado- 
Kloos,  N.  Martinez,  and  L.  Sanchez.  "A  Co-Design 
Methodology  Based  on  Formal  Specification  and  High- 
level  Estimation/'  Fourth  International  Workshop  on 
Hardware! Software  CoDesign ,  p.28. 

[Cha94]  P.  K.  Chan,  M.  Schalg,  and  J.  Y.  Zien.  "Spectral 
K-Way  Ratio-Cut  Partitioning  and  Clustering,"  IEEE 
Trans .  On  Computer-Aided  Design ,  Vol.  13,  No.  9, 
September  1994,  pp.  1088-1095. 

[Con94]  J.  Cong,  Z.  Li,  and  R.  Bagrodia.  "Acyclic  Multi- 
Way  Partitioning  of  Boolean  Networks,"  Proceedings  of 
the  31st  ACMUEEE  Design  Automation  Conference ,  pp. 
670-675. 

[Dav96]  KC.  Davis,  S.  Venkatesan,  and  L.M.L. 
Delcambre,  "Sharing  Electronic  Design  Data  Via  Semantic 
Spaces,”  submitted,  1996. 

[Gaj94]  D.  Gajski,  F.  Vahid,  S.  Narayan,  and  J.  Gong, 
Specification  and  Design  of  Embedded  Systems ,  Prentice- 
Hall,  Inc,  Englewood  Cliffs,  NJ,  1994. 

[Gup93]  R.  K.  Gupta,  "Co-Synthesis  of 
Hardware/Software  for  Digital  Embedded  Systems,"  PhD 
Dissertation,  Stanford  University,  1993. 

[Hen96]  J.  Henkel  and  R.  Ernst.  "The  Interplay  of  Run¬ 
Time  Estimation  and  Granularity  in  HW/SW  Partitioning," 
Fourth  International  Workshop  on  HardwarelSoftare 
CoDesign ,  p.52. 

[ISC85]  Inter.  Society  on  Circuits  and  Systems,  1985. 

[Keu94]  K  Keutzer,  "Hardware-Software  Co-Design  and 
ESDA,"  Proc.  of  31st  Design  Automation  Conference ,  pp. 
435-436.  1994. 

[Kim90]  W.  Kim,  J.  Banerjee,  H.-T.  Chou,  and  J.F.  Garza, 
"Object-oriented  Database  Support  for  CAD,"  Computer 
Aided  Design ,  Vol.  22,  No.  8,  October  1990,  pp.  469-479. 

[Mar96]  T.  Marlowe,  A.  Stoyenko,  P.  Laplante,  R.  Daita, 
C.  Amaro,  C.  Nguyen,  and  S.  Howelll,  "Multiple-Goal 
Objective  Functions  for  Optimization  of  Task  Assignment 
in  Complex  Computer  Systems,"  Control  Engineering 
Practice ,  Vol.  4  No.  2,  1996.  pp.  251-256. 

[Mil95]  R.  Miller  and  H.  Carter,  "Hardware/Software 
Partitioning  in  COMET,"  Proceedings  of  the  COMET 
Project  Review  Meeting ,  presentation  slides,  1995. 

[Nay91]  TK  Nayak,  AX  Majumdar,  A.  Basu,  and  S. 
Sarkar,  "VLODS:  A  VLSI  Object  Oriented  Database 
System,"  Information  Systems,  V ol.  16,  No.  1,  1991,  pp. 
73-96. 

[SDF95]  Standard  Delay  Format  Specification ,  Version 
3.0,  Open  Verilog  International,  Los  Gatos,  CA  95032, 
May  1995. 

[She94]  N.  Sherwani,  Algorithms  for  VLSI  Physical 
Design  Automation ,  Kluwer  Academic  Publishers,  Nor  well. 
Mass,  Second  Printing  1994. 


[Sie89]  E.  Siepmann  and  G.  Zimmermann,  "Object- 
Oriented  Datamodel  for  the  VLSI  Design  System 
FLAYOUT,"  Proc .  of  the  26th  ACMUEEE  Design 
Automation  Conference ,  Las  Vegas,  NV,  1989,  pp.  814- 
817. 

[Vem94]  R.  Vemuri,  H.  Carter,  and  P.  Alexander,  "Board 
and  MCM  Level  Synthesis  for  Embedded  Systems  in  the 
COMET  Cosynthesis  Environment,"  Proceedings  of  the 
First  Annual  RASSP  Conference ,  Arlington,  VA,  August 
1994,  pp.  124-133. 

[Ven94]  S.  Venkatesan  and  K.C.  Davis,  "A  Data  Model 
for  VHDL  Databases,"  VHDL  International  Users  Forum 
Spring-94,  Oakland,  CA,  May  1994,  IEEE  Computer 
Society  Press,  pp.  173-182. 

[Ven95]  S.  Venkatesan  and  KC.  Davis,  "Odyssey:  An 
Electronic  Design  Automation  Database,"  Proc.  of  the  2nd 
International  Conference  on  Applications  of  Databases , 
Santa  Clara,  CA,  December  1995,  pp.  147-157. 

[Ven96a]  S.  Venkatesan,  "Database  Modeling  for 
Electronic  Design  Automation  Environments,"  Ph.D. 
Dissertation,  Electrical  and  Computer  Engineering  and 
Computer  Science  Department,  University  of  Cincinnati, 
Cincinnati,  OH  45221-0030,  January,  1996. 

[Ven96b]  S.  Venkatesan  and  KC.  Davis,  "A  Meta-model 
and  Semantic  Mapping  Methodology  for  Hardware  Design 
Data  Management,"  Journal  of  Integrated  Computer-Aided 
Engineering ,  Vol.  3,  No.  1,  January  1996. 

[Ven96c]  S.  Venkatesan  and  KC.  Davis,  "Flexible 
Component  Retrieval  for  Co-Design,"  submitted,  1996. 

[Wag92]  F.R.  Wagner,  L.G.  Golendziner,  J.  Lacombe,  and 
A.  H.  Viegas  de  Lima,  "Design  Version  Management  in  the 
STAR  Framework,"  IFIP92 ,  edited  by  M.  Newman  and  T. 
Rhyne,  Elsevier  Science  Publishers  B.V.  (North-Holland), 
March  1992. 

[Wag95]  F.R.  Wagner,  "Design  Management 
Requirements  for  Hardware  Description  Languages," 
Proceedings  EURO  VHDL  95, 1995. 

[Wei91]  Y.C.  Wei  and  CK.  Cheng,  "Ratio  Cut 
Partitioning  for  Hierarchical  Designs,"  Transactions  on 
Computer-Aided  Design,  Vol.  10,  No.  7,  July  1991,  pp. 
911-921. 

[Yeh95]  C.  Yeh,  C.  Cheng,  and  T.  Lin,  "Optimization  by 
Iterative  Improvement:  An  Experimental  Evaluation  on 
Two-Way  Partitioning",  IEEE  Trans .  On  Computer-Aided 
Design  of  Integrated  Circuits  and  Systems ,  Vol.  14,  No.  2, 
February  1995,  pp.  145-153. 


78 


APPENDIX  F: 

A  Retiming  Based  Relaxation  Heuristic  for 
Resource-Constrained  Loop  Pipelining  * 


Vinoo  Srinivasan  and  Ranga  Vemurfi 


Laboratory  for  Digital  Design  Environments 
Department  of  ECECS 
P.0.  Box  210030 
University  of  Cincinnati 
Cincinnati,  OH  45221-0030 


'This  work  was  partially  supported  by  the  ARPA  RASSP  program  and  monitored  by  the  Wright  Lab,  US-AF  under 
contract  number  F33615-93-C-1316  and  ARPA  HPCC  program  monitored  by  the  FBI  under  contract  number  J-FBI-93-116 
f Author  for  Correspondence,  (513)-556-4784  (Voice),  (513)-556-7326  (FAX),  Ranga.Vemuri@UC.EDU 


79 

A  Retiming  Based  Relaxation  Heuristic  for  Resource- Const  rained 

Loop  Pipelining. 

Abstract 

This  paper  presents  a  fast  and  efficient  heuristic  for  pipelining  a  loop  under  resource-constraints.  The  loop 
is  represented  as  a  dependence  graph,  G,  whose  nodes  are  operations  that  are  hound  to  available  resources 
and  edges  denote  the  data  dependencies  between  the  operations.  The  data  dependencies  restrict  the  degree 
of  parallelism  that  can  be  achieved  while  scheduling  the  graph.  We  propose  a  fast  retiming  based  graph 
transformation  technique  which  relaxes  the  data  dependencies  in  the  graph  while  maintaining  functional 
equivalence.  Relaxing  data  dependencies  provides  more  flexibility  for  the  scheduler  to  schedule  operations, 
thereby  leading  to  faster  throughput.  Our  objective  is  to  obtain  a  retimed  graph  which  when  scheduled 
achieves  an  optimal/near- optimal  pipelined  steady  state  throughput.  A  detailed  algorithm  is  presented  to 
solve  the  problem.  We  provide  results  that  illustrate  the  effectiveness  of  our  algorithm. 
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2  Definitions  and  Terminology 

2.1  Specification  Model 

The  target  architecture  model  for  our  retiming  framework  consists  of  a  resource  set  7 Z  =  {Ri,-  •  -  ,Rn}. 
For  a  given  operation  n,  t(n)p_  is  the  execution  time  for  the  operation  n  on  the  resource  R  £  H.  If  the 
operation  n  cannot  be  executed  on  the  resource  R  then  t(n)n  -  oo.  The  loop  body  is  represented  by  a 
Dependency  Graph.  We  assume  that  the  graph  is  executed  several  times  corresponding  to  the  iterative 
computations  of  the  loop,  involving  varying  data  sets  over  time.  Our  loop  representation  is  an  extension 
of  that  in  UNRET  [9]  arid  is  similar  to  the  signal  processing  data  flow  graph  representation  in  [19]. 

Definition  2.1  A  dependence  graph  (DG)  is  a  directed  graph  denoted  by  a  5-tuple,  VQ  —  (V,E,\,6,f3). 
V  is  the  set  of  nodes  representing  the  operations  in  the  loop.  E  is  the  set  of  directed  edges  corresponding 
to  the  dependencies.  A  :  V  N  is  a  mapping  which  assigns  an  iteration  index  to  each  node  in  the  DG. 
6  :  E  i-s-  Af  is  a  mapping  which  assigns  an  non-negative  integer  delay  value  to  all  the  edges  in  the  DG. 
(3  :  V  •—  1Z  is  a  binding  of  each  node  to  a  resource.  □ 

Iteration  index  (A):  Since  the  DG  represents  an  iterative  algorithm,  each  iteration  of  the  DG  execution 
invokes  all  the  operations  in  the  graph  once.  Thus  if  the  DG  is  executed  over  N  iterations,  then  each  node 
v  £  V  has  N  instances  vi,  vo  •  •  •  un'-i  ,  VN  where  vt  is  that  instance  of  the  node  v  corresponding  to  the  ith 
iteration  of  the  DG.  The  subscript  i  in  t?,-  is  the  iteration  index  (A). 

Dependency  delay  (<5):  Edges  in  the  DG  represent  data  dependencies.  A  delay  of  nD  on  an  edge  is 
equivalent  to  having  n  delay  units  on  that  edge.  An  edge  uQ  -£■  r0  (&n  edge  from  node  uq  to  node  Vo  with 
a  k  delay  units)  implies  data  dependence  from  instance  uc  to  instance  vc+fc,  for  c  >  0.  In  general  an  edge 
m  -£■  Vj  implies  data  dependence  from  instance  tt,-+c  to  vj+ t+c,  for  c  >  0. 

Depending  on  the  level  of  granularity,  nodes  in  the  DG  can  range  from  simple  operations  like  multiplica¬ 
tions  and  additions  to  complex  macro  operations  like  fast  Fourier  transforms  and  matrix  multiplications. 
Correspondingly,  the  resources  can  range  from  simple  multipliers  and  adders  to  off-the-shelf  microproces¬ 
sors  and  FPGAs.  In  the  rest  of  this  paper  we  will  use  the  word  task  synonymous  with  nodes  and  operations. 
A  DG  is  considered  legal  only  if  the  following  three  conditions  are  satisfied: 

•  Vti  6  V  :  A  (it)  >  0  (!) 

•  V  u  — ■  v  £  E  :  5(u  -i-v)>0  (2) 

•  V  cycles  c  £  G  :  6(c)  >  0,  where  6(c)  =  ^  6(u  ->■  v)  (3) 

u—v  €  c 

The  condition  (1)  does  not  permit  nodes  with  negative  iteration  indices,  (2)  forces  all  edges  to  have  non¬ 
negative  delays  and  (3)  eliminates  the  existence  of  any  cycle  with  zero  or  negative  delay.  An  Initial  DG  is  a 

DG  such  that  Vn  £  V  :  A(v)  =  0.  Dependencies  between  task  instances  belonging  to  the  same  steady  state 

execution  are  called  local  dependencies  while  those  between  task  instances  of  different  steady  states  are 
called  global  dependencies.  All  edges  with  6(e)  =  0  are  called  local  edges  and  denote  local  dependencies, 
while  edges  with  <5(e)  >  0  are  called  global  edges  and  denote  global  dependencies. 
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2.2  Scheduling  a  DG 

Given  a  dependence  graph  G  —  ( V,E,\,S,/3 ),  we  define  the  set  Ei  =  {e  6  E  :  5(e)  =  0}  to  be  the  set 
of  local  dependencies.  Only  the  local  dependencies  affect  the  schedulability  of  G.  A  node  v  €  V  is  a  head 
node  if  and  only  if,  there  exists  no  node  u  such  that  u  —  u  €  Ei.  A  node  v  €  V  is  a  tail  node  if  and  only 
if,  there  exists  no  node  w  such  that  v  — *■  w  €  Ei.  For  any  node  u  £  V,  the  execution  time  for  that  node 
when  executed  on  the  resource  to  which  it  is  bound  (/3(u))  is  called  latency  of  that  node,  l(u).  For  any 
path  u  v  £  Ei  (path  involving  only  the  edges  in  E{),  the  latency  of  the  path,  l(u  v),  is  equal  to  the 
sum  of  latencies  of  the  nodes  that  belong  to  the  path.  Mathematically, 

V  u  £  F  :  l(u )  =  t(u)p(u) 

£  Ei  :  l(u  -n*  v)  =  ^2  Kn )  (^) 

n€(u^v) 

A  path,  p  £  Ei,  is  a  critical  path  if  for  all  paths  p'  £  Ei :  l(p)  >  l(p')-  C PL  (Critical  Path  Latency)  denotes 
the  latency  of  the  critical  path  in  the  dependence  graph. 

Definition  2.2  :  S(G )  (Schedule  of  G) 

A  schedule  of  a  the  graph  G  =  (V,  E,  A,  6,  P)  is  a  mapping  S  :V  ^  j\r  such  that: 

V(u  — ►  v)  €  £/ :  S(v)  >  5(«)  +  /(u)  a 

The  schedule  of  a  graph,  S(G)  -  definition  2.2,  is  an  assignment  of  start  times  for  the  execution  of  all  the 
tasks  in  the  graph,  such  that  the  local  data  dependencies  are  not  violated.  The  Initiation  Interval  (II)  of 
a  loop  is  the  time  interval  between  consecutive  executions  of  its  steady  state.  Given  a  schedule  of  a  loop, 
the  initiation  interval  of  the  loop  for  that  schedule,  IIs,  is  the  difference  between  the  time  at  which  all 
scheduled  tasks  finished  execution  and  the  earliest  time  at  which  any  task  was  scheduled. 

IIs  max  (S(u)  +  l(u))  —  min  S(u)  (5) 

ugV  v  u€V 

Since  all  the  tasks  in  the  critical  path  have  to  be  scheduled,  for  any  schedule  S(G),  IIs  >  CPL. 


2.3  Theoretical  bounds  on  Initiation  Interval 


It  is  clear  that  CPL  poses  a  bound  on  the  the  II  of  the  graph.  The  resource  constraints  and  the  recurrences 
present  in  the  DG  also  restrict  the  II  of  the  steady  state  [19,  12,  9].  Consider  a  DG  with  k  multiplication 
operations,  and  a  resource  set  with  n  multipliers,  then,  assuming  multiplication  takes  unit  time,  it  will  take 
at  least  \k/n]  time  units  to  schedule  all  the  multiplication  operations.  The  maximum  of  such  time  bounds 
over  all  resource  types  is  the  Minimum  Initiation  Interval  (Mil)  due  to  resource  constraint,  represented  as 
MIITes ■  In  the  presence  of  a  recurrence  r  in  the  DG,  the  steady  state  execution  time  is  lower  bounded  by 
|7(r)/5(r)"|  time  units  to  assure  proper  execution  of  the  recurrence.  The  maximum  of  such  bounds  over  all 
recurrences  in  the  graph  is  the  Mil  due  to  recurrences,  represented  as  MIIrec ■  Mathematically, 


MUres  = 


max 


H  Ku) 

Vu  :  j3{  u)=Ri 
7li 


MIIrec  =  { 


l(r) 

max 

reG  S(r ) 


if  there  is  no  recurrence 
otherwise 
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where  iZj  is  a  specific  resource  type  from  the  resource  set  7 Z  and  n,-  is  number  of  such  resources  available, 
r  is  a  recurrence  in  the  DG.  l(r)  is  the  sum  of  the  latencies  of  all  the  nodes  in  r  and  6(r )  is  sum  of  the 
delays  of  all  the  edges  in  r. 

Definition  2.3  The  Minimum  Initiation  Interval  ( MIIg )  achievable  by  any  schedule  for  a  given 
graph  is  the  maximum  of  the  two  lower  bounds  discussed  above. 

IIS  >  MIIg  =  max(MIIre3,  MIIrec)  □ 


2.4  Retiming  the  Dependence  Graph 

Definition  2.4  :  r(G)  (Retiming  of  a  dependence  graph  G) 

The  retiming  operation  transforms  the  graph  G  —  (V,E,  \,S,/3)  into  a  new  graph  Gr  =  (V,  E,  Ar,£r,  /?), 
such  that: 

V(u  — h  v)  €  E  :  6r(u  — »■  v)  —  6(u  — *  v)  =  (Ar(u)  -  A(u))  —  (Ar(u)  -  A(u))  □ 

A  retiming  operation  is  legal  if  it  always  transforms  a  legal  dependence  graph  G  to  a  retimed  graph  Gr 
which  is  also  legal.  Recollect  that  a  legal  DG  is  one  which  satisfies  conditions  (1),  (2),  and  (3).  In  the  rest 
of  our  discussion  we  restrict  ourselves  to  legal  retiming  operations.  If  Gr  is  a  retimed  graph  of  G  derived 
by  a  legal  retime  operation,  then  GT  is  functionally  equivalent  to  G  [20].  The  retiming  operation  does  not 
change  the  Mil  of  a  graph.  However,  retiming  may  introduce  delays  on  local  edges  thereby  eliminating 
local  dependencies.  Eliminating  local  edges  that  belong  to  the  critical  path  may  reduce  the  CPL,  which 
might  lead  to  faster  schedules. 

Figure  1  shows  an  example  of  how  retiming  is  used  to  generate  pipelined  schedules  with  better  throughput. 
The  DG  has  four  tasks  A,B,C  and  D  and  two  resources  Rl  and  R2.  Tasks  .4,  C  are  bound  to  R1  and 
tasks  B,  D  are  bound  to  R2.  For  simplicity  we  assume  that  all  four  tasks  take  unit  time  to  execute.  We  see 
that  the  schedule  for  the  original  graph  tasks  3  cycles  per  iteration  of  the  loop  (II  =  3)  while  the  retimed 
graph  has  an  II  of  2  cycles  for  the  steady  state.  Also  notice  that  after  retiming  we  achieve  a  pipelined 
schedule  (Figure  1-b)  while  the  schedule  produced  for  the  initial  graph  is  non- pipelined  (Figure  1-a). 

3  Resource- Constrained  Loop  Pipelining 

In  this  section  we  present  our  algorithm  that  attempts  to  generate  an  optimal  resource-constrained  pipelined 
schedule  for  a  given  dependence  graph  representing  a  loop.  We  consider  a  pipelined  schedule  optimal  if 
the  steady  state  initiation  interval  of  the  schedule  (IIs)  is  equal  to  the  minimum  initiation  interval  of  the 
loop  (MIIg)  as  given  in  definition  2.3.  Given  the  initial  graph  of  the  loop  we  try  to  produce  the  retimed 
graph  which  when  scheduled  achieves  the  best  possible  steady  state  throughput. 

Since  we  want  to  achieve  the  best  throughput,  the  aim  of  the  retiming  algorithm  must  be  to  eliminate  as 
many  local  dependencies  as  possible.  Figure  2  shows  two  examples  where  retiming  is  used  to  eliminate 
local  dependencies  in  a  DG.  The  underlying  retiming  operation  used  in  Figure  2  is  the  one  referred  to 
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Figure  1:  An  Example  of  Retiming  to  generate  pipelined  schedules 


as  nodal  transfer  in  [12],  which  is  same  as  dependence  retiming  transformation  presented  in  [9].  We  shall 
ra.11  it  the  the  function  shift-node.  For  a  given  node  v  £  V ,  and  for  a  positive  integer  k  the  function 
shift  .node(v,k)  performs  the  following  steps: 

•  Xr(v)  *-  A(i>)  +  k 

•  V(u  — *  v)  €  E  :  8r( u  -*•  v)  S(u  v)  —  k 

•  V(w  -+  w)  e  E  :  8r(v  -C  w)  *-  8(v  ->■  w)  +  k 

The  shift -node{v,k)  function  transforms  a  given  DG  into  an  equivalent  retimed  DG  satisfying  definition 
2.4.  However,  $hift.node(v,k )  will  be  a  legal  retiming  operation  only  if  for  all  edges  u  -*  v  €  E  :  8(u  -+ 
v)  >  k,  otherwise  edges  with  negative  delays  will  be  created.  A  DG  is  defined  to  be  systolic  if  it  has  no 
local  dependencies  [21],  i.e.  Ve  6  E  :  8(e)  >  0.  For  a  systolic  DG  it  is  trivially  possible  to  obtain  a  schedule 


Figure  2:  Retiming  Transformations  to  Eliminate  Local  Dependencies 


5 


84 


Algorithm  3.1  (Retiming  an  Acyclic  DG) 

G  =  (V,E,\,6,fi)  :  The  Initial  DG  to  be  retimed.  >  The  initial  graph  is  acyclic 
procedure  retime  .acyclic  J)G(G) 

begin 

while  (3  head  node  u  €  V  such  that  (3  u  — *  v  €  E  :  6(u  — *  v)  =  0)  )  do 
shift.node(u ,  1); 

return  G 
end 


Figure  3:  Retiming  a  cyclic  DG 
that  is  optimal  with  IIs  equal  to  MIIres. 

If  the  initial  graph  has  no  cycles  then  it  is  always  possible  to  introduce  positive  delays  on  all  its  edges 
and  achieve  the  optimal  throughput.  Algorithm  3.1  is  simple  procedure  which  eliminates  all  local  edges  in 
an  acyclic  DG,  just  by  making  calls  to  the  shift  .nodeQ  function.  Figure  2-(b)  illustrates  the  flow  of  this 
algorithm  when  applied  to  a  acyclic  DG-  In  Algorithm  3.1,  since  the  node  u  is  a  head  node,  we  do  not 
create  any  edges  with  negative  delays.  In  the  case  of  DGs  with  cycles,  it  is  not  always  possible  to  eliminate 
all  local  edges.  Consider  the  initial  cyclic  DG  in  Figure  3.  Any  legal  retimed  graph  of  the  initial  graph 
always  has  two  local  data  dependencies.  In  more  general  terms  it  can  be  easily  proved  that  for  all  cycles 
c  in  the  graph,  6(c)  (sum  of  the  delays  of  the  edges  in  the  cycle)  does  not  change  with  retiming.  Thus 
for  DGs  with  cyclic  dependencies,  there  are  cases  when  we  can  only  shift  around  delays  (i.e.  reducing  the 
delay  value  on  certain  edges  and  adding  it  to  others)  rather  than  creating  new  delays. 

Although  for  any  cycle,  c,  in  the  DG,  6(c)  is  constant  over  retiming,  the  number  of  positive  delay  edges  in 


Figure  4:  Relaxing  a  cyclic  DG 
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Produce  Best  Schedule 


Figure  5:  Resource-constrained  Loop  Pipelining  Methodology 

obtain  the  pipelined  throughput.  If  the  throughput  achieved  by  the  scheduler  is  equal  to  MIIq,  then  the 
optimal  steady  state  throughput  has  been  achieved  and  we  do  not  proceed  to  phase  two.  We  use  a  simple 
list-scheduler  [22]  with  mobility  of  the  nodes  as  the  primary  priority.  In  the  second  phase  we  pass  G',  the 
output  of  phase  one,  to  a  retime  and  schedule  algorithm.  We  now  present  the  details  of  both  the  phases 
of  our  algorithm. 

3.1.1  Phase  I  Algorithm 

In  the  first  phase  we  try  to  transform  the  given  initial  graph  into  an  MRG.  Our  approach  is  presented 
in  Algorithm  3.2.  Before  invoking  the  algorithm  we  identify  the  set  of  edges  in  the  DG  which  belong  to 
recurrences.  A  directed  edge  from  node  u  to  node  v  belongs  to  the  recurrence  set,  TZ,  if  there  exists  a 
directed  path  from  v  to  u  (i.e.  there  is  a  cycle  involving  the  edge  u  —  v).  Mathematically,  JZ  —  {u 
v  e  E  |  3  path  v  u}.  Edges  that  belong  to  1Z  are  called  recurrence  edges  and  the  rest  are  called 

non-recurrence  edges.  The  procedure  relax-DG{ )  in  Algorithm  3.2  has  two  while  loops.  The  first  while 
loop  transforms  all  non-recurrence  local  edges  into  global  edges.  The  second  while  loop  tries  to  introduce 
delays  on  local  edges  that  belong  to  1Z. 

Relaxing  non-recurrence  edges:  This  is  done  in  the  first  while  loop  of  the  algorithm  3.2.  Consider  an 
zero  delay  edge  u  —  v  that  does  no  belong  to  any  recurrence.  We  follow  a  simple  approach  to  introduce 
a  unit  delay  on  this  edge  without  decrementing  existing  delays  on  any  other  edge  of  the  graph.  For  all 
nodes,  n,  belonging  to  the  set  that  includes  the  node  u  and  all  nodes  from  which  u  can  be  reached,  perform 
node.shift{n ,  1).  The  above  retiming  will  introduce  an  additional  unit  delay  on  all  out  edges  from  u 
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Algorithm  3.2  (Phase  I  -  Relaxing  a  DG  ) 

G  =  (V}  E,  A,  5,  /?)  ;  The  Initial  DG  to  be  relaxed  until  it  is  a  MRG. 

7 Z  :  The  set  of  recurrence  edges  that  belong  to  E 

procedure  relax  J)G(G) 
begin 

t>  Eliminate  all  non-recurrence  local  edges 

while  (3(u  — *  v)  E  E  s.t .  (5(u  — ►  t;)  =  0)  A  (it  — ►  v  £  11))  do 

begin 

for  each  n  E  ({u}  U  {k  \  3a  path  k  ^  u})  do 
shift  jnode{n:  1) 

end  while 

>  Now  try  to  eliminate  local  edges  belonging  to  recurrences. 

while  (  [u  — ►  v,  shift]  getJiext_relaxable_edge(G)  )  do  >  loops  until  function  returns  NULL 

begin 

shiftjnode{u ,  shift) 

for  each  edge  t  — ►  u  £  H  do 

begin 

if  (6(t  — ►  u)  <  1)  then 
d  i —  1  —  3{t  — *  u) 

for  each  n  E  ({t}  U  |  3  a  path  k  ^  t})  do 
shift-node(ni  d) 

end  if 
end  for 
end  while 

end 

(excluding  the  self  loop),  while  not  introducing  any  new  local  dependency  in  the  graph.  Thus  the  edge 
u  — *  v  is  no  longer  a  local  edge. 

Figure  6  illustrates  non-recurrence  edge  relaxing,  (a)  is  the  initial  graph.  The  non-recurrence  zero  delay 
edge  B  -+  D  is  selected  to  be  relaxed.  A,C  are  the  nodes  from  which  B  can  be  reached.  Hence,  the 
shiftjnode(n ,  1)  function  is  performed  on  nodes  5,  A  and  C.  (b)  is  the  graph  obtained  after  all  shift-nodes 
are  performed.  Notice  that  for  all  edges  from  one  of  the  three  nodes  (A,  J5,C)  to  any  of  the  remaining 
nodes,  the  delay  on  the  edge  is  increased  by  one  unit.  So  in  (b)  we  see  that  delays  are  introduced  on  the 
edges  B  — ►  D  and  C  D.  We  continue  this  procedure  until  all  local  dependencies  are  eliminated.  Figure 
6-(c)  shows  the  graph  obtained  after  all  the  local  dependencies  are  eliminated. 

Relaxing  recurrence  edges:  This  is  done  in  the  second  while  loop  of  the  algorithm  3.2.  Consider  a 
zero  delay  recurrence  edge  u  v  in  the  graph.  The  approach  taken  for  non-recurrence  edge  will  not  work 
here  because  u  is  reachable  from  itself.  However  if  all  recurrence  edges  incident  on  u  (excluding  self  loop) 
have  delay  >  kD  units,  for  some  positive  integer  k,  then  we  can  perform  $hift-node(u,k  -  1).  This  will 
introduce  a  positive  delay  on  the  recurrence  edge  a  -+  v  and  all  recurrence  edges  that  are  incident  on  u  will 
remain  positive.  However  non-recurrence  edges  incident  on  u  may  be  transformed  into  local  dependencies. 
These  new  local  dependencies  can  be  eliminated  through  the  approach  previously  discussed.  As  stated 
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Algorithm  3.3  (Select  an  Edge  to  Relax) 

G  =  (V,E,\,6,j3)  :  A  node  has  to  be  selected  from  then  input  graph  G 
selected[e]  :  selected  is  a  global  boolean  array.  selected[e ]  denotes  if  the  edge  e  has 
already  been  selected  or  not.  Initially  all  edges  are  marked  unselected. 

71  ;  The  set  of  recurrence  edges  that  belong  to  E 

function  getjiext-relaxable.edge(G)  •  (u  —  v  :  E.  shift :  int) 

begin 

while  (  3  edge  (u-*v)€fc  such  that  (not  selected[u  v])  A  ( S(u  -+  v)  =  0)  ) 

E'  *-  {(t  -*•  u)  6  ft  |  t  ±  u} 

shift  <—  min  5(e) 
e€E* 

if  (shift  >  1)  then 
selected[u  — ►  u]  «—  1 
return  (u  — ►  v,  shift  —  1) 

end  if 
end  while 
return  NULL 
end 

earlier  the  sum  of  the  delays  on  any  recurrence  is  constant.  So,  essentially,  we  select  nodes  belonging  to 
recurrences  that  have  excess  delays  on  all  recurrence  edges  incident  on  them  and  redistribute  the  excess 
delay  to  their  outgoing  edges. 

The  function  get .next jrelaxable.edge(G)  selects  the  candidate  recurrence  edge  to  be  relaxed  next.  The 
function  also  returns  an  integer  value,  shift,  by  which  the  selected  edge  can  be  relaxed.  The  selection 
function  is  shown  in  algorithm  3.3.  This  function  selects  nodes  belonging  to  recurrences  such  that  all 
recurrence  edges  (excluding  self  loops)  incident  on  it  have  a  delay  greater  than  one.  If  no  such  unselected 
node  exists  then  it  returns  null.  The  integer  value,  shift,  returned  by  this  function  is  equal  to  one  less  than 
the  least  delay  on  the  recurrence  edges  mentioned  above.  For  each  edge,  u  -*>  v,  selected  by  the  selection 
function,  shift.node(u,  shi ft)  is  performed.  Thus  delays  on  all  outgoing  edges  of  u  will  be  increased  by 
shi ft  and  delays  on  all  edges  incident  on  u  will  be  decreased  by  shift.  This  will  eliminate  the  local 
dependency  u-*v.  Due  to  way  shift  was  computed,  positive  delays  are  maintained  on  all  recurrence  edges 
incident  on  u.  The  only  local  dependencies  that  may  be  created  are  on  the  non-recurrence  edges  incident 
on  u.  However  using  the  technique  discussed  before  to  relax  non-recurrence  local  dependencies,  these  new 
local  edges  are  eliminated. 

Figure  7  shows  an  example  of  relaxing  recurrence  edges.  In  Figure  7-(a)  the  local  recurrence  edge  B  —  C 
is  chosen  to  be  relaxed  and  the  value  of  shift  is  2  (the  excess  delay  on  the  edge  D  -+  B).  shift.node(B,2) 
is  performed  to  distribute  the  excess  delay  to  the  local  node  B  -*  C.  Notice  that  in  Figure  7-(b)  the  edge 
D  — .  B  now  has  a  unit  delay.  In  order  to  maintain  the  unit  delay  on  the  non-recurrence  edge  A  —  B,  a 
shift.node(A, 2)  is  performed.  The  graph  (b)  is  an  MRG  and  so  the  selection  function  of  Algorithm  3.3 

returns  null. 
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(a)  -  Initial  DG 


(c)  -  All  non-recurrence 
edges  relaxed 


Figure  6:  Relaxing  Non-Recurrence  Edges 


(b)  Edge  B->C  relaxed 


Figure  7:  Relaxing  Recurrence  Edges 


3.1.2  Phase  II  Algorithm 

The  phase  two  algorithm  is  invoked  if  the  MRG  obtained  after  phase  one  does  not  produce  a  schedule 
that  achieves  the  optimal  steady  state  throughput.  Since  the  schedule  is  not  optimal,  the  resources  are 
not  fully  utilized.  There  are  gaps  in  the  schedule  where  certain  resources  are  idle.  These  gaps  are  created 
due  to  presence  of  certain  local  data  dependencies.  We  identify  such  dependencies  and  introduce  delays  on 
them  at  the  expense  of  introducing  other  local  dependencies.  The  retimed  graph  is  scheduled  again  and 
the  process  is  continued  either  until  the  optimal  throughput  is  achieved  or  until  all  edges  are  tried.  The 
best  throughput  is  reported  if  the  optimal  value  is  not  achieved. 

Figure  8  illustrates  our  phase  two  algorithm.  The  graph  in  (a)  is  the  DG  obtained  after  phase  one.  The 
graph  has  four  tasks.  Tasks  A  and  B  are  bound  to  the  processing  element  2  (P E2)  and  have  execution 
times  of  40  and  60  cycles  respectively.  Tasks  C  and  D  are  bound  to  the  processing  element  1  (PEI)  and 
have  execution  times  of  50  and  45  cycles  respectively.  The  MIIres  is  equal  to  100  (max(60+40,  50+45)). 
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(h)  ♦  DC  after  Phase  2 


Figure  8:  Illustration  of  the  Phase  2  algorithm 

There  are  two  recurrences  in  (a)  -  A  —  B  — >  C  A  and  A-+B-+D  —  C-+A.  MIIrec  is  equal  to 
150  (max(  150/1, 195/2)).  Thus  the  MIIg  for  the  DG  is  150  (max(150, 100)).  Figure  8-(c)  is  the  schedule 
obtained  for  the  graph  in  (a).  The  throughput  obtained  for  (a)  was  195  cycles.  We  notice  that  PEI  is  not 
utilized  for  the  first  100  cycles  of  the  schedule,  which  is  what  we  call  gap  in  the  schedule.  The  gap  is  created 
due  to  the  local  dependency  B  -*  C.  The  task  C  has  to  wait  until  task  B  completes  execution.  Hence, 
local  edge  B  -*  C  is  chosen  to  be  relaxed.  To  create  a  delay  on  this  edge,  shift.node(B,  1)  is  invoked,  but 
since  the  edge  A  -*■  B  is  also  a  local  edge,  shift.node(A,  1)  is  in  turn  called.  In  general  shift.nodeQ  is 
recursively  invoked  until  a  legal  DG  is  obtained. 

Figure  8-(b)  is  the  DG  obtained  after  the  edge  B  -*  C  is  relaxed.  Notice  that  the  edge  C  — *■  A  is  now 
a  local  edge.  The  iteration  indices  of  A  and  B  are  incremented  by  one.  Figure  8-(d)  is  the  schedule  for 
the  retimed  DG  in  (b).  This  schedule  is  a  pipelined  schedule  representing  the  steady  state  execution  of 
the  loop.  The  schedule  achieves  the  optimal  steady  state  throughput  of  150  cycles  per  execution.  If  the 
optimal  solution  were  not  achieved,  the  algorithm  would  identify  the  local  edges  causing  gaps  and  continue 
the  relaxation  process.  A  resource  is  considered  a  five  until  the  time  the  last  task  scheduled  on  it  completes 
execution.  It  is  a  critical  resource  if  it  is  alive  beyond  the  optimal  schedule  time  of  the  steady  state  (MIIq)- 
Gaps  on  non  critical  resources  are  ignored.  If  there  are  more  than  one  unselected  local  edges  causing  gaps, 
then  we  choose  one  of  them  based  on  priorities  such  as  criticality  of  the  resource,  gap  size,  and  edges 
belonging  to  the  critical  path. 

UNRET  [9]  also  uses  a  retime  and  reschedule  approach.  But,  instead  of  looking  for  gaps  in  the  schedule 
like  our  phase  2  approach,  it  picks  an  unselected  head  node  from  the  DG,  performs  shift-node  on  it  and 
reschedules  the  retimed  DG.  The  process  continues  either  until  optimal  throughput  is  achieved  or  until  all 
nodes  are  selected.  The  phase  2  approach  we  follow  is  efficient  because  each  retiming  move  is  dependent 
on  the  feedback  from  the  schedule  produced,  rather  than  arbitrarily  choosing  a  head  node  as  in  [9].  The 
main  difference  between  our  resource  constrained  loop-pipelining  methodology,  presented  in  Figure  5,  and 
that  in  [9,  11]  is  the  lack  of  phase  1  in  the  later.  The  advantage  of  the  relaxation  scheme  followed  in  phase 
1  is  that  there  may  be  no  need  to  resort  to  the  phase  II  algorithm  because  the  relaxed  graph  obtained  as 
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Example 

Number 

Num.  of 

Tasks 

Num.  of 
Dependencies 

MIIg 

(cycles) 

1 

40 

80 

1282 

2 

50 

150 

1852 

3 

100 

300 

2788 

4 

100 

500 

3281 

5 

200 

1000 

6774 

6 

300 

1500 

8744 

7 

400 

1600 

12008 

8 

400 

2000 

13264 

9 

500 

2000 

15353 

10 

500 

2500 

15467 

Table  1:  Design  Data  for  the  Test  Examples 


the  result  of  phase  I  produces  the  optimal  schedule.  Even  in  the  case  when  phase  II  cannot  be  avoided, 
the  convergence  time  of  phase  II  when  preceded  by  phase  I  is  usually  much  faster  than  just  phase  II  alone 
because  in  the  former  case  the  second  phase  starts  off  with  as  maximally  relaxed  graph.  In  the  next  section 
we  present  results  to  justify  the  above  claims. 

4  Results 

In  this  section  we  present  results  of  our  resource-constrained  loop- pipelining  methodology  shown  in  Figure 
5.  We  compare  our  algorithm  against  the  retime  and  schedule  scheme  in  UNRET  [9].  We  have  implemented 
all  algorithms  in  C++  on  a  Sparc  5  Unix  workstation  running  at  143Mhz  clock.  The  reason  why  we  chose 
UNRET  for  our  comparison  is  that  the  later  has  been  compared  against  several  known  pipelining  schemes 
and  proved  effective  in  [9]. 

We  have  implemented  a  dependence  graph  generator  that  can  produce  synthetic  graphs  of  varying  com¬ 
plexities.  The  generator  takes  a  input  the  number  of  nodes,  number  of  edges,  number  of  resources  available, 
execution  time  range  and  maximum  delay  on  any  edge.  Table  1  presents  the  details  of  the  synthesized  de¬ 
pendence  graphs  generated  that  are  used  to  study  the  efficiency  of  our  methodology.  To  keep  the  scheduler 
simple,  we  consider  two  resources  like  the  example  in  Figure  8.  Each  task  is  mapped  randomly  to  one  of 
the  resources,  and  the  execution  time  is  randomly  selected  from  the  uniformly  distributed  interval  [20;  100] 
cycles.  The  maximum  delay  value  on  any  edge  is  3  delay  units  and  the  probability  of  an  edge  being  a  local 
dependency  is  0.8.  All  graphs  generated  are  legal  dependence  graphs.  Table  1  also  shows  the  theoretical 
bound  on  the  initiation  interval  of  any  pipelined  execution  for  all  the  ten  test  graphs. 

Table  2  compares  our  loop- pipelining  algorithm  against  that  of  UNRET  for  the  10  examples  in  Table  1. 
Column  2  (C2)  is  the  amount  of  execution  time  spent  on  Phase  I  of  the  algorithm,  C3  is  the  time  spent  in 
Phase  II,  and  C4  is  the  total  execution  time.  Column  6  is  the  time  taken  by  retiming  approach  presented 
in  UNRET.  All  times  are  reported  in  milli  seconds.  Columns  5  and  7  are  the  cycle  times  of  the  fastest 
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Example 

Number 

Two  Phased  Algorithm 

UNRET 

B 

Phase  I 
Time  (ms) 

Phase  II 
Time  (ms) 

Total 
Time  (ms) 

Us 

(cycles) 

II s 

(cycles) 

1 

1.1 

0 

1.1 

1282 

43.8 

1282 

39 

2 

0.8 

7.8 

8.6 

1852 

184.3 

1881 

21 

3 

3.6 

0 

3.6 

2788 

215.0 

2788 

60 

4 

2.3 

486.1 

488.4 

3519 

894.5 

3607 

2 

5 

5.0 

36.0 

41.0 

6774 

1240.0 

6774 

30 

6 

7.9 

'  45.7 

53.6 

8744 

1195.1 

8744 

22 

.  7 

23.3 

87.7 

111.0 

12008 

4196.2 

12008 

38 

8 

13.3 

60.1 

73.4 

13264 

5236.8 

13264 

71 

9 

19.0 

0 

19.0 

15353 

2645.0 

15353 

139 

10 

21.1 

394.0 

415.1 

15467 

8123.0 

15467 

20 

Table  2:  Resource  Constrained  Loop  Pipelining  :  Results 


pipelined  schedule  produced  by  our  approach  and  UNRET  s  approach  respectively.  The  numbers  in  bold 
indicate  that  the  optimal  throughput  time  was  achieved.  Our  algorithms  achieves  the  optimal  throughput 
for  9  of  10  examples.  For  example  4  both  approaches  failed  to  produce  the  optimal  throughput. 

The  result  we  want  to  highlight  in  Table  2  is  the  speed  up  in  the  execution  times.  For  the  10  examples, 
on  an  average,  our  approach  is  about  44  times  faster  than  that  of  UNRET.  Only  for  example  4,  where 
both  approaches  fail  to  produce  the  optimal  result,  we  do  not  see  a  substantial  speed  up.  UNRET  is  slow 
because  of  the  time  it  spends  in  the  scheduler.  Each  time  a  shift  .node-  .-operation  is  done,  the  graph  is 
rescheduled.  As  the  size  of  the  graph  increases,  scheduling  becomes  much  slower.  Our  phase  2  algorithm 
also  uses  a  retime  and  reschedule  approach  like  UNRET,  but  we  differ  in  the  way  the  graph  is  retimed. 
The  reason  for  the  speed  up  is  the  presence  of  the  relaxation  algorithm  of  phase  I.  For  examples  1,  3,  and  9 
phase  II  was  not  needed.  For  the  remaining  examples,  although  phase  II  was  needed,  it  converged  toward 
the  optimal  solution  much  faster  than  UNRET.  Thus,  our  approach  is  atleast  as  efficient  UNRET  in  terms 
of  throughput  achieved  for  a  given  loop,  while  at  the  same  time  it’s  execution  time  is  several  magnitudes 
faster  than  the  later. 


5  Conclusion 

This  paper  presented  an  efficient  two  phased  algorithm  for  resource-constrained  loop  pipelining.  Our  algo¬ 
rithm  extensively  uses  retiming  techniques  [7]  to  generate  pipelined  schedules.  The  focus  of  our  algorithm 
was  to  achieve  the  best  possible  steady  state  throughput  for  a  given  loop  while  expending  minimal  com¬ 
putation  time.  The  effectiveness  of  our  algorithm  was  illustrated  through  several  synthetically  generated 
dependence  graphs,  representing  loops  of  varying  complexities.  Results  show  that  execution  time  of  our 
algorithm  is  much  faster  than  the  scheme  in  UNRET  [9]  while  not  sacrificing  the  quality  of  the  steady  state 
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throughput.  Currently  we  are  applying  the  our  loop-pipeling  algorithm  to  a  hardware/software  codesign 
framework  to  produce  pipelined  hardware- software  codesins. 
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Initial  attempts  at  multicomponent  synthesis  involved  carrying  out  high  level  synthesis  and  then  par¬ 
titioning  the  resultant  design  to  realize  a  multichip  design.  High  level  synthesis  converts  a  behavioral 
specification  of  a  digital  system  into  an  equivalent  rtl  design  (composed  of  a  data  path  and  a  finite  state 
controller;  the  data  path  is  a  composition  of  components  selected  from  a  register-level  component  library) 
that  meets  a  set  of  stated  performance  constraints.  This  rtl  design  is  then  partitioned  onto  multiple  chips 
to  realize  a  multicomponent  design.  Recent  efforts  in  system-level  synthesis  have  led  to  the  development 
of  high  level  synthesis  systems  that  can  produce  multichip  digital  systems  [18,  46,  10].  These  systems, 
however,  do  not  consider  the  impact  of  packaging  on  high  level  synthesis  and  hence  designs  produced  by 
these  systems  cannot  efficiently  use  available  high  performance  packaging  technology. 

Recent  and  ongoing  revolution  in  electronics  packaging  has  resulted  in  many  high  performance  packaging 
technologies  such  as  thin  film  multichip  modules  (mcms).  Packaging  significantly  impacts  the  performance 
and  cost  of  systems.  High  level  synthesis  systems  can  no  longer  target  just  single  chip  designs  or  multichip 
designs  without  considering  packaging  technology.  To  make  effective  use  of  mcm  technologies,  high  level 
synthesis  systems  must  generate  multichip  structures  taking  into  account  the  impact  of  packaging  on  system 
performance,  heat,  and  cost. 

Multicomponent  Synthesis  with  Hierarchical  Package  Design  is  the  process  of  high  level  synthesis  targeting 
multichip  and/or  multicomponent  implementations  of  the  input  behavioral  specification  to  take  advantage 
of  available  packaging  technologies.  Multicomponent  synthesis  and  hierarchical  package  design  is  char¬ 
acterized  by  simultaneous  synthesis  of:  (1)  multiple  register-level  designs  that  interact  with  each  other 
and  together  implement  the  function  specified  in  the  input  behavioral  specification;  (2)  a  composition  of 
these  designs  into  a  hierarchical  structural  design;  and  (3)  a  mapping  of  these  register  level  designs  and 
hierarchical  structures  onto  efficient  physical  packages  to  realize  a  package  hierarchy  for  the  design. 

Hierarchical  RTL  Partitioning  and  Package  Design:  Traditional  partitioning  and  package  design  is  re¬ 
stricted  to  a  single  level.  A  design  is  partitioned  onto  multiple  packages  at  a  particular  level.  However, 
digital  designs  occupy  a  hierarchy  of  packages  from  bare  dies  to  boards  (or  backplanes  and  higher  as 
needed).  Also,  packages  come  in  various  sizes  with  differing  area  and  pin  capacities  and  dollar  costs.  Cost 
effective  packaging  solutions  for  designs  can  be  generated  by  carrying  out  hierarchical  partitioning  of  the 
input  RTL  description  onto  a  specified  package  library. 

Payne  and  van  Cleemput  [38]  developed  an  automatic  partitioning  technique  for  logic  gates  in  order  to  meet 
gate  and  pin  count  constraints  on  chips.  Beardslee  et  al  [1]  developed  SLIP,  an  environment  for  system- 
level  interactive  partitioning.  SLIP  provides  routines  for  maintaining  and  modifying  a  design  hierarchy. 
These  routines  are  used  by  partitioning  algorithms  to  update  and  maintain  design  data.  Saab  and  Rao 
[43]  proposed  an  evolution  based  approach  for  partitioning  logic  circuits.  Their  approach  takes  constraints 
on  the  size  of  each  part  and  number  of  pins.  Also  takes  testable  and  critical  nets  into  account  during 
partitioning.  Testable  nets  are  cut  to  make  them  observable  and  critical  nets  are  not  cut. 

Resnick  designed  SPARTA  [39]  to  evaluate  rtl  designs  with  a  spreadsheet-like  approach.  SPARTA  checks 
for  violation  of  area,  power,  and  pin  count  constraints.  Shih,  Kuh,  and  Tsay  [44]  use  a  clustering  step  to 
satisfy  timing  constraints  before  using  the  Kemighan-Lin  algorithm  to  partition  functional  blocks  into  a 
multicomponent  design  targeted  to  multichip  modules  (mcm).  Vemuri  applies  genetic  algorithms  for  parti¬ 
tioning  register  level  designs  for  mcms  [47,  51].  A  comparison  with  simulated  annealing  based  partitioning 
is  also  presented. 

Walker  and  Thomas  [53]  describe  manual  partitioning  as  part  of  design  transformations  in  high  level 
synthesis.  McFarland  [25]  uses  a  hierarchical  clustering  technique,  based  on  a  measure  of  similarity,  in 
partitioning  behavioral  hardware  descriptions.  These  clustering  algorithms  are  used  in  BUD  [29]  to  perform 
a  part  of  the  allocation  and  module  binding  phase  in  data  path  synthesis  in  DAA  [28].  Lagnese  and  Thomas 
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use  a  multistage  clustering  technique  to  partition  a  behavioral  specifications  into  multiple  processes  to 
improve  the  quality  of  single  chip  designs  [23,  24].  The  approach  shows  significant  area  reductions  in  single 
chip  designs,  but  does  not  consider  design  constraints  or  multichip  implementations.  Gupta  and  De  Micheli 
[10]  use  the  Kemighan-Lin  and  simulated  annealing  techniques  for  partitioning  functional  models  while 
satisfying  area  and  timing  constraints.  Pin-sharing  or  area/delay  characteristic  of  registers,  multiplexers, 
controllers,  or  wiring  are  not  considered.  Design  constraints  are  not  considered.  Kucukcakar  and  Parker 
[17, 18]  describe  CHOP,  a  framework  for  interactive  partitioning,  in  which  the  designer  creates  and  modifies 
partitions  and  CHOP  evaluates  the  validity  of  each  partition  by  searching  for  possible  implementations 
through  predictions.  Vahid  and  Gajski  [46]  describe  partitioning  at  the  algorithmic  level.  Clustering  and 
Kemighan-Lin  algorithms  are  used  in  partitioning.  A  preliminary  bit-slice  synthesis  of  behavioral  objects 
in  the  design  is  performed  prior  to  partitioning  to  generate  performance  characteristics  of  synthesized 
behavioral  objects.  Operator  sharing  across  concurrent  blocks  is  not  considered  —  each  concurrent  block 
is  synthesized  separately  and  gets  a  set  of  dedicated  hardware  resources.  During  partitioning,  as  the 
composition  of  the  design  changes,  new  performance  characteristics  axe  not  generated. 

We  develop  a  generic  hierarchical  graph  partitioning  and  packaging  model  for  (1)  multicomponent  synthe¬ 
sis  with  hierarchical  package  design  and  (2)  hierarchical  rtl  partitioning  and  package  design  and  propose 
a  generic  hierarchical  partitioning  and  package  design  algorithm  to  accomplish  the  tasks.  We  present  a 
generic  input  graph  specification  model  for  behavioral  descriptions  and  RTL  netlists  (post  high  level  syn¬ 
thesis)  and  a  model  for  packaging  options.  We,  then,  formulate  the  hierarchical  partitioning  and  package 
design  problem  and  propose  a  solution.  We,  first,  develop  a  mathematical  model  of  the  hierarchical  parti¬ 
tioning  and  package  design  problem  and,  then,  map  our  problem  domains,  (1)  multicomponent  synthesis 
with  hierarchical  package  design  and  (2)  partitioning  register  level  designs  onto  a  hierarchy  of  packages 
(from  a  package  library),  onto  the  mathematical  model.  We,  then,  propose  a  solution  to  the  hierarchical 
partitioning  and  package  design  problem.  We  present  experimental  results  for  both  approaches  using  our 
hierarchical  partitioning  and  package  design  algorithm  for  some  examples.  And,  finally,  we  present  a  com¬ 
parison  between  multicomponent  synthesis  and  hierarchical  RTL  partitioning  and  discuss  the  validity  and 
applicability  of  our  approach  for  modem  designs  and  high  performance  packaging  technologies. 


2  Problem  Formulation 

An  Example:  Figure  1  shows  an  example  graph.  Consider  the  set  of  nodes  of  the  graph,  N  =  {ni ,  n 2, 713,  n4,  ns}. 
We  shall  use  this  example  to  illustrate  some  definitions  in  the  problem  formulation.  Though  we  present 
the  formulation  for  a  generic  graph,  we  discuss  domain  specific  details  for  multicomponent  synthesis  and 
RTL  partitioning  as  we  present  definitions. 

The  problem  is  introduced  incrementally.  Definitions  2.1  and  2.2  introduce  the  concept  of  a  hierarchical 
k-level  partition  of  a  set.  Definition  2.3  extends  our  notion  of  a  k-level  partition  of  a  set  to  a  k-level 
partition  of  a  graph.  Definition  2.4  defines  a  set  of  package  levels.  Definition  2.5  outlines  a  model  for 
specifying  package  alternatives  and  their  associated  properties.  Definition  2.6  shows  the  binding  between 
a  k-level  partition  of  a  graph  and  a  set  of  package  alternatives.  The  performance  attribute  computations 
are  outlined  in  Definition  2.7. 

Definition  2.1  A  1-level  partition  of  a  set  A/"  is  a  system,  <S,  of  nonempty  sets  (called  segments)  such  that 

(a)  S  is  a  system  of  mutually  disjoint  sets,  i.e.,  if  C  €  S,  D  €  <S,  and  C  r  D,  then  C  fi  D  =  <f>, 

(b)  the  union  of  S  is  the  whole  set  A/",  i.e.,  U<S  =  A/\ 


The  set  5  =  {51,55,53},  in  Figure  1,  defines  a  1-level  partition  of  N. 

s\  =  {771,772},  S2  =  {773,714},  vjid  53  =  {775}. 
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H  =  {nl,  n2,  n3,  n4,  nS} 
Ft  =  S  =  {s1,  s2,  s3} 
Sl={n1tn2} 

32  =  {n3,  n4} 
s3  =  {n5} 

P2  s  {sll,  312,  s13} 

3l1={s1} 

s12  =  {s2} 

313  =  {33} 

P3  =  {s21} 

S21  ={311,312,313} 


Figure  1:  An  Example  Graph  and  its  k-level  Partition 


Definition  2.2  A  k-level  partition,  V,  of  a  set  jV  is  a  set  of  1-level  partitions  Pi,P2,...,Pk  such  that 

(a)  for  1  <  i  <  k  -  1,  P,+1  is  a  1-level  partition  of  Pi, 

(b)  Pi  is  a  1-level  partition  of  Af. 

The  3-level  partition  of  N  (see  Figure  1)  is  given  by: 

Pi  =  S  =  {si,s2,-s3}, 

?2  =  {sii)«si2,si3};  $n  =  {^l},  $12  —  {$2},  a nd  s13  =  {s3},  and 

P3  =  {.S2l};  S21  =  {Sn,Si2,Si3}. 

We  extend  the  notion  of  a  k-level  partition  of  a  set  to  define  the  k-level  partition  of  a  graph  G  =  (N,  E ), 
where  N  is  the  set  of  nodes  and  E  is  the  set  of  edges.  In  the  case  of  multicomponent  synthesis,  the  input 
behavioral  specification  viewed  as  a  process  graph  is  the  input  graph,  where  N  is  the  set  of  processes  and 
E  is  the  set  of  communication  signals.  In  the  case  of  rtl  partitioning,  the  graph  is  the  input  rtl  netlist, 
where  N  is  the  set  of  register  level  components  and  E  is  the  set  of  interconnections  between  register  level 
components. 


Definition  2.3  A  k-level  partition  of  a  graph  G  =  ( N,E )  is  a  k-level  partition  of  N,  where  N  is  the  set 
of  nodes  and  E  is  the  set  of  edges. 

(a)  area  of  a  node  n  €  N  is  given  by  A(n), 

(b)  switching  activity  of  a  node  n  €  N  is  given  by  H(n), 

(c)  clock  speed  of  execution  of  a  node  n  6  N  is  given  by  T(n). 

The  performance  attributes  of  nodes  in  the  graph,  A(tc),  H[n),  and  T(n),  are  assumed  to  be  primi¬ 
tive  values  supplied  with  the  graph  specification.  In  the  case  of  multicomponent  synthesis,  performance 
attributes  of  nodes  and  level-1  partition  segments  in  the  graph,  >l(n),  H(n),  and  T(n),  are  determined 
through  scheduling  and  performance  estimation  of  individual  nodes  (level- 1  partition  segments)  (see  Sec¬ 
tion  3  and  [19]).  In  the  case  of  RTL  partitioning,  performance  attributes  of  nodes  are  obtained  from  a 
register  level  component  library.  Only  the  area  attribute  of  register  level  components  is  supported  at  the 
RTL  level. 

Definition  2.4  The  level  set,  C,  is  a  set  of  k  natural  numbers  1,2,...,  Jb,  i.e 

£  =  {1,2,...,*}. 
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Id 

Area 
Capacity 
a(p)  (sq  mm) 

Switch 

Capacity 

h(p) 

Pin 

Capacity 

b(p) 

Speed 
Capacity 
t(p)  (ns) 

Cost 

c(p)  ($) 

Level 

lmap(p) 

Pi 

5 

400 

40 

50 

400 

1 

P2 

10 

400 

80 

50 

600 

1 

P3 

18 

1000 

84 

50 

1500 

1 

Pa 

6 

600 

40 

50 

250 

2 

P5 

12 

600 

80 

50 

300 

2 

Pq 

20 

1200 

84 

75 

600 

2 

P7 

40 

5000 

64 

100 

200 

3 

Ps 

60 

5000 

84 

100 

400 

3 

Table  1:  Example  of  Package  Alternatives 


Definition  2.5 

(1)  P  is  a  set  of  package  alternatives,  i.e.,  P  =  {pi,po,  ■  •  .  ,pn}  with 

area  capacity  a(p),  switching  activity  capacity  h(p),  pin  capacity  6(p),  speed  capacity  t(p),  and 
cost  c(p )  for  p  £  P 

(2)  Imap  is  a  function  that  maps  elements  of  P  to  the  level  set 
Imap  :  P  — ►  C 

(3)  The  minimal  elements,  PTOj„,  of  P  are  given  by 
Pmin  =  {p  |  P  €  P  and  lmap(p )  =  1} 

(4)  The  maximal  elements,  Pmax »  of  P  are  given  by 
Pmax  =  {p  I  P  €  P  and  lmap(p )  = 

(5)  A  relation  -<  is  defined  in  P  such  that 

Pi  -<  P2  iff  package  p\  can  be  contained  in  package  p2,  i.e., 
lmap(pi)  =  /map(pi)  +  1 

(6)  The  defining  size  of  a  package  set  P  is  the  package  level  of  the  maximal  elements,  i.e., 
defining  size  =  lmap(maximal  element )  =  k. 


Table  1  shows  an  example  set  of  package  alternatives  with  area  capacity,  heat  capacity,  pin  capacity,  speed 
capacity,  cost,  and  Imap  defined  for  all  its  members.  The  defining  size  of  this  package  set  is  three 

To  realize  a  hierarchical  package  design,  the  k-level  partition  of  a  graph  (Definition  2.3)  needs  to  be  bound 
to  packages  from  the  available  set  of  package  alternatives  (Definition  2.5).  Definition  2.6  describes  this 
binding. 


Definition  2.6  A  binding  of  a  k-level  partition  of  a  set  j\f  to  a  set  of  package  alternatives  P  yields  a  set 
of  map  functions  M: 

M  =  (mapi,map2,  ...,mapit} 

mapi :  Pi  — *•  p,-,  P,  €  P,  p,-  C  P,  pi  is  a  bag ,  i.e.,  duplicates  are  allowed  in  p;  and 
Vp  €  pt-  lmap(p )  =  i 
such  that 

if  5  is  a  segment  in  Pi,  then 

mapi(Pi)  y  mapi-i(S)  i.e.,  Vp  €  mapi(Pi)  and  Vq  €  mapt_i(5),  p  >-  g. 

Consider  the  3-level  partition  of  N  from  Figure  1.  A  binding  of  this  3-level  partition  of  N  to  the  set  of 
package  alternatives  from  Table  1  yields  the  following  set  of  map  functions: 
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M  =  {mapi,map2,mapz}, 
mapi  :  Pi  -*  {P3,P3,P2>, 
map2  :  Pi  —  {?6,P6,Ps},  and 
mop3  :  P3  -*  {pg} 

mapz{Pz)  >■  map2(P2)  >■  mapx(Pi),  i.e., 
lmap(p6 )  =  Imap(pz)  +  1 
lmap(p$)  =  lmap(pe )  +  1. 

To  find  a  package  design  that  satisfies  constraints  imposed  by  packages,  rules  of  computation  for  perfor¬ 
mance  attributes  of  partition  segments  need  to  be  developed.  Definition  2.7  outlines  rules  of  computation 
to  determine  area,  switching  activity,  pins,  and  speed  of  partition  segments.  Performance  attributes  of 
partition  segments  at  higher  levels  of  packaging  are  computed  from  performance  attributes  of  constituent 
parts  at  lower  packaging  levels.  Performance  attributes  of  segments  at  level-1  are  computed  from  primitive 
attributes  of  nodes  in  the  input  graph. 

Definition  2.7  The  computation  rules  for  the  physical  attributes  of  area,  heat,  pins,  and  speed  of  a 
segment  5  in  a  1-level  partition  P{  (part  of  a  k-level  partition  V)  are  defined  below: 

for  2  <  i  <  k: 

(a)  area  of  segment  A(S)  is  given  by: 

A(S)  =  XXm<zPl'-*(5)) 

$£S 

(b)  heat  of  segment  H ( S )  is  given  by: 

j(S)=Ejw 

*es 

(c)  pins  of  segment  B(S)  are  given  by: 

B(S )  =  es,  ex  spans  segments  $a  and  sj;  sa  €  S  and  $b  €  Sy;  Sy  €  Pi,  and  S  ±  Sy 

ex<zE 

(d)  speed  of  segment  T(S )  is  given  by: 

T(S )  =  max(T(s)),  s  €  5; 

for  Px: 

(a)  area  of  segment  A(5)  is  given  by: 

A(S)  =  ^2  A(n)  and  n€  N 

ti65 

(b)  heat  of  segment  3  (5)  is  given  by: 

E(S)  =  £  H{n)  and  n,  €  N 

nSS 

(c)  pins  of  segment  B(S)  axe  given  by: 

B(S)  —  5Z  e*5  e x  sPa;as  nodes  na  and  ny,  na  €  S  and  n&  G  Sy;  Sy  €  Pi,  and  S  ^  Sy 

ex€E 

(d)  speed  of  segment  T(S )  is  given  by: 

T(S )  =  max(T(n)),  n  G  S  and  n  €  N. 

In  the  case  of  multicomponent  synthesis,  performance  attributes  of  level- 1  partition  segments  are  computed 
by  carrying  out  a  schedule  and  performance  estimate  step  on  each  proposed  segment.  Physical  attribute 
computation  is  shown  below  for  the  example  in  Figure  1. 
for  Pi: 

A(si)  =  A(nx)  +  A(n2)  =  18,  A(s2)  =  A(n3)  +  A(n4)  =  17,  and  A(s3)  =  A(n5)  =  10 
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J7(S!)  =  JT(ni)  +  E (rc2)  =  700,  H(s2)  =  ff(n3)  +  E(n4)  =  670,  and  E(s3)  =  E(ns )  =  380 

£(sx)  =  73,  B{s2)  =  73,  and  B{$3)  =  68 

T(si)  =  max(T(ni),  T(n2))  =  max(100, 100)  =  100, 

T(s2)  =  max(T(n3),T(n4))  =  max(  100, 100)  —  100,  and 
T(s3)  =  max(T(ns ))  =  100 

for  P2 : 

4(su)  =  a(mapi{si))  =  a(p 3)  =  18,  A(si2)  =  a(mapi($2))  =  a(p 3)  =  18,  and 
A(s13)  =  a(mapi{s3 ))  =  a(p2)  =  10 

H(sn)  =  £T(5i)  =  700,  H(s12)  -  H{s2)  =  670,  and  H(s13)  =  H{s3)  =  380 
B(su )  =  73,  5(512)  =  73,  and  B(s13)  =  68 

T{su)  =  max(T(si))  =  100,  T(si2)  =  max(T(s2 ))  =  100,  and  T(s  13)  =  max(T{s3 ))  =  100 
for  P3 : 

A(52i)  =  a(map2(5n))  +  a(map2(512))  +  a(7nap2(5i3))  =  20  +  20  +  12  =  52 
E($2i)  =  #($11)  +  if(3i2)  +  ^(Sis)  =  1750 
B(s2  1)  =  75 

T(52i)  =  max(T(5ii),T(5i2),r(si3))  =  100. 

Definition  2.8  formulates  the  hierarchical  package  design  problem  for  an  input  graph  G  and  a  package  set 
P.  The  hierarchical  k-level  package  design  problem  is  presented  below  as  a  constraint  satisfying  k-level 
partitioning  problem  (Definition  2.3)  that  is  bound  to  packages  from  the  package  library.  At  each  level,  i 
in  the  package  hierarchy,  the  binding  generated  by  map.-  has  to  be  a  package  from  the  set  of  packages  such 
that  performance  constraints  are  satisfied.  Also,  cost  constraint  on  the  entire  design  has  to  be  satisfied. 

Definition  2.8  Given  G  =  ( N ,  E),  a  package  set  P  with  defining  size  k,  and  a  cost  constraint  C,  find  a 
k-level  partition  V  =  {Pi,P2,-  •  • , Pk]  of  G  and  a  binding  of  V  to  P  such  that 
for  1  <  i  <  k,  if  5  €  Pi 

A(S )  <  a(mapi(S)), 

E(S)  <  h(mapi(S)), 

B(S)  <  b(mapi(S)), 

T(5)  >  t(mapi(S)). 

subject  to 

k 

Cost(P )  =  TTcimapiiPi));  Cost(V )  <  C. 
i= 1 

A  cost  constraint  of  8  5500.00  yields  a  solution  to  the  k-level  partitioning  problem,  for  our  running  example 
(Figure  1),  with  cost  $  5500.00  and  the  following  characteristics  of  the  binding  (see  Figure  2). 

for  Pi: 

(A(sx)  =  18)  <  (a(mapi(si))  =  a(ps)  =  18),  ( A{$2 )  =  17)  <  (a(mapi(s2))  =  a(ps)  =  18),  and 
(A(-S3)  =  10)  <  (a(mapi(s3))  =  a (p2)  =  10) 

(£■(51)  =  700)  <  (h(mapi(si))  =  h(ps)  =  1000),  (E(s2)  =  670)  <  (h(map1(s2))  =  h(pz)  =  1000),  and 
( H(s3 )  =  380)  <  ( h(mapi(s3 ))  =  h(p3)  =  400) 

(B(si)  =  73)  <  ( b(mapi(si ))  =  6(ps)  =  84),  ( B(s2 )  =  73)  <  (6(mapi(s2))  =  b(p3,)  =  84),  and 
(B($s)  =  68)  <  (6(map!(s3))  =  b(p2)  =  80) 

(T(si)  =  100)  >  (t(mapi(si))  =  t(ps)  =  50),  (T(s2)  =  100)  >  (t(mapi(s2))  =  t(p3)  =  50),  and 
(T(s3)  =  100)  >  ( t(maPl(s3 ))  =  t(p2)  =  50) 

for  P2: 

(A(su)  =  18)  <  (a(map2(sn))  =  a(p3)  =  20),  (A(si2)  =  18)  <  (a(7nap2(si2))  =  a(ps)  =  20),  and 
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N  =  {n1,n2,n3,  B4,  nS} 
Pi  =S  =  {5l,s2,a3} 
si  =  {nl,  n2) 

*2  =  {n3,  n4) 
s3  =  {n 5} 

P2=  {511,512,  413} 

411  =  {81} 

412  =  {42} 

413  =  {43} 

P3  =  {s2l} 

421  =  {811, 512, 513} 


Figure  2:  Example  Solution 


(•4(«13)  =  10)  <  (a(map3(s13))  =  a(p5)  =  12) 

(.HXsii)  =  700)  <  ( h(map2(su ))  =  h(p6)  =  1200),  (E(s  12)  =  670)  <  (h(map2($  12))  =  Mite)  =  1200),  and 
Ie(siz)  =  380)  <  ( h(map2(siz ))  =  h(ps)  =  600) 

(5(«u)  =  73)  <  (&(map2(su))  =  b{p&)  =  84),  (£(s12)  =  73)  <  (&(m<zp2(s12))  =  b(p6)  =  84),  and 
(bIs13)  =  68)  <  (6(map2(«i3))  =  b(ps)  =  80) 

(T(sn)  =  100)  >  ( t(map2(sn ))  =  t(p€)  =  75),  (T(s12)  =  100)  >  (t(map2($i2))  =  t(p6)  =  75),  and 
(T(s13)  =  100)  >  (t(map2('Si3))  =  t(Ps)  =  50) 

for  P3: 

(A(52i)  =  52)  <  (a(map3(s2i))  =  a(ps)  =  60) 

(5"(s2i)  =  1750)  <  ( h{mapz{s2\ ))  =  h(p8)  =  5000) 

(5(^21 )  =  75)  <  ( b(mapz(s21 ))  =  b(p8)  =  84) 

(T(s2  1)  =  100)  >  (t(map3(s2x))  =  t(p$)  =  100) 
k 

CostlfP)  =  ^2  c(mapi(Pi))  =  c(mapi(Pi))  +  c(map2(P2 ))  +  c(mapz(Pz))  =  8  5500 

i=l 

c(mapi(Pi))  =  c(pz)  +  c(pz)  +  c(p2)  =  8  3600 
c(map2(P2))  =  c(p6)  +  c(p6)  +  c(p5)  =  $  1500 
c(mapz(P3))  =  c(p$)  =  8400 

Cost  of  packaging  Cost{V)  is  8  5500  and  cost  constraint  C  is  8  5500. 

Tims,  Cost(V)  <  C. 


3  Scheduling  and  Performance  Estimation 

Scheduling  and  Performance  Estimation  are  important  steps  in  high  level  synthesis  and  are  used  to  explore 
the  design  space  [3,  27,  17,  18].  We  briefly  describe  scheduling  (see  [35,  36,  37,  6]  for  more  details)  and 
performance  estimation  (see  [11,  12, 13,  30,  21,  20,  19]  for  more  details). 

Scheduling:  Scheduling  is  the  first  important  step  in  the  synthesis  process.  The  input  behavioral  speci¬ 
fication  is  converted  into  an  equivalent  data  flow  graph  (dfg)  representation.  Scheduling  operates  on  the 
dfg.  DFG  operations  are  assigned  to  specific  control  steps  and  are  bound  to  physical  ALUs  available  in 
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the  component  library.  The  output  of  scheduling  is  a  time-stamped  and  partially  bound  data  flow  graph, 
that  satisfies  user  specified  constraints.  Scheduling  determines  execution  speed  of  the  synthesized  design 
in  terms  of  clock  speed  and  number  of  clock  cycles  required  to  execute  all  operations.  In  addition,  it  fixes 
control  and  data  path  (alu)  architectures  —  the  architecture  impacts  on  performance  of  the  design.  An 
implementation  of  Paulin’s  force-directed  list  scheduling  [35,  36],  extended  for  communicating  and  con¬ 
currently  executing  processes  [6],  is  used.  Force-directed  scheduling  produces  maximally  fast  (minimum 
number  of  control  steps)  schedules  under  resource  constraints.  Force-directed  scheduling  tries  to  maximize 
operation  concurrency,  ensuring  high  resource  utilization.  Hardware  resources  are  shared  across  concurrent 
blocks.  As  a  result,  operations  in  concurrent  blocks  are  scheduled  under  global  resource  constraints.  All 
operations  are  treated  as  macro  operations  that  execute  in  one  logical  control  step.  Operations  such  as  ‘+’, 
and  call  etc.  are  treated  alike.  Logical  control  steps  are  expanded  into  equivalent  physical  clock  steps 
during  control  generation  [41].  All  arithmetic,  logical,  and  relational  operations  engage  a  single  hardware 
resource.  Subprograms,  loop,  and  wait  modules  are  assumed  to  engage  all  available  resources.  Hence,  call 
operations  do  not  share  control  steps  with  any  other  operation,  i.e.,  no  other  operation  is  scheduled  in  the 
same  control  step  as  a  call  operation. 

Performance  Estimation:  For  high  performance  packaging  technology  such  as  mcms,  power/heat  dis¬ 
sipation  in  the  design  is  very  important.  An  accurate  performance  estimator  for  power/heat  dissipation  is 
needed  to  generate  good  designs.  Many  studies  in  power  estimation  for  switch  level  and  gate  level  circuits 
have  assumed  that  average  power  dissipation  is  directly  proportional  to  the  average  switching  activity 
[32,  31,  15,  4,  45,  2,  42].  In  CMOS  designs,  dynamic  power  consumption  is  predominant  and  is  directly 
proportional  to  the  aggregate  (total)  switching  activity  (asa)  in  the  circuit,  asa  in  the  design  is  defined 
as  the  total  number  of  circuit  node  switchings  and  is  dependent  upon  the  input  patterns  stimulating  the 
circuit.  The  design  is  composed  of  components  from  a  cell  library  and  a  finite  state  controller  implemented 
as  a  collection  of  plas. 

We  use  a  profile-driven  approach  to  switching  activity  estimation.  In  this  approach,  event  activities  related 
to  various  operations  and  carriers  in  the  behavioral  specification  are  measured  by  simulating  the  description 
ii si-ng  user-supplied  inputs.  A  profiler  is  a  tool  that  simulates  the  behavioral  specification  with  user- 
supplied  input  patterns,  called  profiling  stimuli.  Before  simulation  begins,  the  profiler  alters  the  behavioral 
specification  by  inserting  probes  (counters)  to  monitor  event  activity  in  various  regions  of  the  specification. 
At  the  end  of  simulation,  the  profiler  prints  the  number  of  times  each  statement  is  executed,  number  of 
invocations  of  each  function  and  similar  data  pertaining  to  the  event  activity  that  occurred  in  the  behavioral 
specification  during  the  simulation  run.  These  event  activities  are  then  used  during  the  synthesis  process 
(during  performance  estimation)  to  estimate  the  switching  activity  in  the  design  being  synthesized. 

High  level  synthesis  uses  a  library  of  parameterized  register  level  module  generators.  Modules  are  pa¬ 
rameterized  with  respect  to  number  of  inputs  where  applicable  and  bit- width  of  each  input.  The  library 
contains  interface  descriptions  of  each  module,  description  of  its  parameters,  and  its  area,  delay  and  av¬ 
erage  intrinsic  switching  activity  (isa)  characteristics.  Area,  delay  and  isa  values  of  each  library  module 
are  determined  by  actually  generating  layouts  for  several  instances  of  the  module  with  different  parameter 
values.  Determination  of  area  and  delay  parameters  for  layout  instances  is  straightforward.  Area  can  be 
directly  measured  from  the  layout  and  delay  can  be  determined  through  simulation  or  a  timing  analysis 
program  such  as  Crystal  [34].  We  define  the  average  intrinsic  switching  activity  (isa)  of  a  module  instance 
as  the  average  number  of  circuit  nodes  that  are  expected  to  switch  when  an  input  event  (change  of  logic 
values  on  the  input  lines)  takes  place.  ISA  of  a  module  instance  is  determined  by  extracting  a  switch  level 
model  from  its  layout,  simulating  the  switching  level  module  using  a  very  long  stream  of  randomly  gener¬ 
ated  input  patterns  and  counting  the  average  number  of  circuit  nodes  switched  per  pattern.  Simulation 
and  counting  continues  until  convergence  occurs. 

Overall  switching  activity  estimation  is  based  on  using  event  activities  to  modulate  the  average  intrinsic 
switching  activities  of  library  modules  used  in  the  synthesis  process.  This  estimate  is  used  to,  in  addition 
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Figure  3:  Thermal  Profiling  of  RT  level  Components 


to  area  and  dock-speed  estimates,  to  guide  the  synthesis  process.  Experimental  results  for  a  number  of 
examples  show  that  switching  activity  estimated  during  synthesis  deviates  by  less  than  10%  on  the  average 
from  the  actual  switching  activity  measured  after  completing  synthesis  [20].  Area  and  delay  estimation  are 
based  on  the  work  of  Jain  [11],  Kurdahi  [21],  Mlinar  [30],  and  Dutta  [6]. 

Thermal  Profile  of  RT  level  components:  For  modem  high  performance  packaging  technology  such 
as  multichip  modules  (mcms)  heat  dissipation  in  the  design  is  a  critical  performance  measure.  For  effident 
utilization  of  such  high  performance  packaging  technologies,  thermal  constraints  of  packages  need  to  be 
satisfied.  To  evaluate  the  feasibility  of  partitions,  accurate  power/heat  dissipation  figures  of  the  register 
level  components  is  required  by  the  partitioning  algorithms.  Power/heat  dissipation  can  be  approximated 
by  an  estimation  of  switching  activity  in  a  design  as  average  power  dissipation  in  a  drcuit  is  directly 
proportional  to  the  average  switching  activity.  The  switching  activity  estimation  procedure  consists  of 
counting  the  switching  activity  of  nodes  in  a  circuit  during  a  switch  level  simulation  of  layout/switch  level 
models  of  the  register  transfer  level  components  with  a  characteristic  set  of  test  vectors.  A  characteristic 
set  of  test  vectors  for  each  component  is  derived  from  the  set  of  behavioral  test  vectors  used  by  the  designer 
to  validate  the  behavioral  specification  prior  to  synthesis. 

Figure  3  demonstrates  the  technique  of  switching  activity  estimation.  Partitions  with  single  register  level 
components  are  generated.  Each  register  level  component  in  the  synthesized  design  is  placed  on  a  separate 
partition.  Layouts  and  switch  level  models  of  these  single  component  partitions  are  generated.  The  switch 
level  models  are  simulated  with  switch  level  test  benches  (generated  using  a  test  bench  compiler  -  tbc 
[49,  52])  and  the  number  of  nodes  switching  in  the  switch  level  model  are  counted.  This  count  of  node 
switches  gives  a  very  accurate  measure  of  the  power /heat  dissipation  in  the  register  level  component.  This 
switching  activity  data  is  used  by  the  partitioning  algorithms  to  generate  thermally  sound  partitions. 

This  process  of  generating  switching  activity  measures  for  all  register  level  components  in  the  rtl  design 
is  too  time  consuming.  For  example,  a  small  traffic  light  controller  example  (TLC,  see  [19,  48]  has  49  RTL 
components  and  gets  synthesized  to  a  4769  transistor  design.  Five  behavioral  test  vectors  get  translated 
into  1320  switch  level  test  vectors  (for  each  component).  Complete  layout  generation,  extraction  of  switch 
level  models,  conversion  of  behavioral  test  vectors  into  switch  level  test  vectors,  and  switch  level  simulation 
together  took  about  48  hours.  The  layouts  and  switch  level  models  of  every  RTL  component  needs  to  be 
generated  individually  and  each  of  them  has  to  be  simulated  at  the  switch  level  with  a  characteristic  set 
of  input  vectors.  A  handful  of  test  vectors  at  the  behavioral  level  explode  into  thousands  of  switch  level 
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vectors.  Layout  generation  and  switch  level  simulation  (for  all  components)  are  too  time  consuming  for 
this  technique  of  switching  activity  estimation  to  be  viable  for  large  RTL  designs. 


4  Hierarchical  Partitioning  and  Package  Design  Algorithm 


The  solution  to  the  above  problem  takes  the  form  of  a  hierarchical  partitioning  and  package  design  algo¬ 
rithm  that  incorporates  back-tracking  while  tightening  cost  constraints  on  the  design  with  each  succeeding 
refinement  in  the  design.  The  algorithm  takes  the  following  inputs:  (1)  behavioral  vhdl  specification  of 
a  digital  system  viewed  as  a  process  graph  composed  of  communicating  and  concurrently  executing  pro¬ 
cesses;  alternately,  an  RTL  netlist  composed  of  register  level  components;  (2)  parameterized  register  level 
component  library  characterized  for  area,  delay,  and  switching  activity;  (3)  package  library  with  area,  pins, 
switching  activity,  clock  speed,  and  cost  information  for  all  packages;  (4)  cost  constraint  C,  in  dollars  on 
the  entire  design.  The  algorithm  begins  by  partitioning  the  process  graph  and  mapping  partition  seg¬ 
ments  (after  scheduling/performance  estimation  to  obtain  accurate  performance  attributes  of  the  design) 
onto  available  bare-die  packages;  alternately,  by  partitioning  the  RTL  netlist  and  mapping  segments  in  the 
partition  onto  available  bare-die  packages.  A  graph  is  constructed  from  the  generated  partition  at  this 
level  for  further  partitioning  at  the  next  higher  level  of  packaging.  The  packaged  partition  segments  form 
nodes  in  the  new  graph;  edges  of  the  graph  are  obtained  from  the  interconnection  of  register  level  designs 
in  the  multicomponent  design.  At  the  next  higher  level  of  packaging,  this  new  graph  is  partitioned  and 
mapped  onto  packages.  This  process  continues  until  the  packaging  hierarchy  is  exhausted  and  at  each  level, 
partition  segments  are  mapped  onto  cost  effective  packages.  If,  at  a  particular  level,  no  solution  is  found, 
we  back-track  to  the  previous  level,  tighten  cost  constraints,  and  construct  a  new  partition  and  continue. 

The  output  of  hierarchical  partitioning  and  package  design  is:  (1)  a  set  of  RTL  designs  (individual  RTL 
designs  that  together  form  the  multicomponent  design);  (2)  a  set  of  structures  that  realizes  the  hierarchical 
design;  and  (3)  a  binding  of  the  RTL  designs  and  structures  to  appropriate  cost  effective  packages  from  the 
package  library  at  each  level  of  packaging.  The  design  satisfies  capacity  constraints  imposed  by  packages 
and  the  algorithm  composes  designs  and  picks  packages  such  that  overall  cost  constraint  on  the  design  is 
satisfied. 

Partitioning  and  package  design  at  each  level  involves:  (1)  determining  cost  constraint  and  physical  con¬ 
straints  on  the  design  —  overall  area  and  switching  activity  constraints  on  the  design  are  derived  from 
the  minimum  capacity  package  at  the  highest  level  in  the  package  hierarchy  (say,  the  mitiirmim  area  and 
switching  activity  capacities  of  all  available  boards  if  the  package  hierarchy  ended  at  board  level);  indi¬ 
vidual  segment  area,  switching  activity,  clock  speed,  and  pin  constraints  are  derived  from  the  capacity  of 
available  packages  at  a  particular  level  of  packaging;  (2)  constructing  the  partition  subject  to  constraints 
and  mapping  onto  a  set  of  cost  effective  packages.  At  level- 1  in  the  packaging  hierarchy,  in  the  case  of  mul¬ 
ticomponent  synthesis,  scheduling  and  performance  estimation  is  carried  out  on  each  proposed  partition 
segment  and  performance  attributes  of  the  segment  are  determined  and  feasibility  of  the  multicomponent 
design  and  partition  checked.  At  other  levels  in  the  packaging  hierarchy,  performance  attributes  of  pro¬ 
posed  partition  segments  are  composed  of  its  constituent  parts  and  their  packaging;  (3)  checking  to  see  if 
constraints  are  satisfied  and  if  we  need  to  back-track  or  proceed  to  next  higher  level  of  packaging;  and  (4) 
construct  netlist  for  next  level  and  go  to  (1).  At  any  level  in  the  package  hierarchy,  the  cost  constraint  is 
determined  by  deducting  the  cost  of  packaging  partitions  at  lower  packaging  levels  and  the  projected  cost 
at  higher  packaging  levels  from  the  total  cost  constraint,  C. 

Setting  Constraints:  Initially,  on  the  first  pass,  overall  constraints  on  area  and  switching  activity  con¬ 
straints  on  the  entire  design  are  derived  from  the  minimum  area  and  switching  activity  capacity  of  packages 
at  the  highest  level  in  the  package  hierarchy  (since,  eventually,  the  design  hierarchy  needs  to  be  mapped 
onto  a  package  at  the  top  level  in  the  package  hierarchy);  the  cost  constraint  is  set  by  subtracting  the 
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Algorithm  4.1  (Set_Constraint) 

P:  package  set,  C:  overall  cost  constraint  on  design 
k:  levels  in  package  hierarchy,  level:  current  level 

area:  overall  area  constraint,  cost:  cost  constraint  at  current  package  level 
CTF:  constraint  tighten  factor  (<  1),  COF:  cost  overrun  factor  (<  1) 
pass:  flag  to  generate  physical  constraints  on  design  (initially  1  ) 

SeLConstraintQ 

begin 

if  pass  =  1  then  /*  set  physical  constraints  from  package  at  level  k  in  package  hierarchy  */ 
cost  <—  C  —  J2i=2  smallest  package  cost 
area  *—  min  (area  capacity  of  package  at  level  k) 
switch  <—  min(svntching  activity  capacity  of  package  at  level  k) 
pass  <—  pass  +  1  /*  set  flag  to  indicate  physical  constraints  set  */ 

elsif  (status  =  SUCC)  V  (bJrack  =  FALSE)  then 

cost  <-  C  -  Z2Llevel+ 1  smallest  package  cost  -  package  cost 

elsif  bJrack  =  TRUE  then 

cosLover.run  *—  co$t\svet  —  cost 
if  cosLoverjrun  <  costal- 1  then 

cost  <—  costievei-i  —  cost-over.run  xCOF 
else 

cost  <-  cost\evei-i  x  CTF 
end  if 
end  if 
end 


cost  of  the  smallest  packages  at  all  levels  of  packaging  above  level- 1.  On  subsequent  invocations,  if  the 
algorithm  is  back-tracking,  a  cost  overrun  is  computed;  if  the  cost  overrun  is  less  than  the  cost  of  the 
previous  level’s  packaging,  cost  constraint  for  the  previous  level  (on  a  back-track)  is  set  by  subtracting  a 
product  of  the  cost  overrun  and  a  cost  overrun  factor  (COF  <  1)  from  the  cost  of  the  previous  level’s 
packaging;  if  the  cost  overrun  is  greater  than  the  cost  of  the  previous  level’s  packaging,  cost  constraint 
for  the  previous  level  (on  a  back-track)  is  set  by  multiplying  the  cost  of  the  previous  level’s  packaging  by 
a  constraint  tighten  factor  (CTF  <  1).  COF  and  CTF  dictate  the  rate  at  which  the  cost  constraint  is 
tightened  on  a  back-track.  Typical  values  of  COF  are  between  0.2-0.3  and  CTF  between  0.9-0.95  to  enable 
effective  search  of  the  design  space.  If  the  algorithm  is  not  back-tracking,  cost  constraint  is  generated  by 
subtracting  the  actual  cost  of  packaging  at  lower  levels  of  packaging  and  the  projected  packaging  cost  at 
higher  levels  (cost  of  smallest  packages)  from  the  total  cost  constraint,  C. 

Hierarchical  Partitioning  and  Package  Design  (HPP):  Algorithm  4.2  presents  the  hierarchical  parti¬ 
tioning  and  package  design  algorithm  (hpp).  hpp  has  access  to  a  hierarchical  clustering  based  partitioning 
algorithm  (hcp  -  Algorithm  4.3)  and  a  multiway  partitioning  algorithm  (mp  -  Algorithm  4.4).  When  parti¬ 
tioning  at  any  level,  HPP  first  determines  cost,  area,  and  switching  activity  constraints  using  Set_Constraint 
(Algorithm  4.1).  hpp  then  invokes  hcp  to  generate  a  partition  and  a  binding  of  its  partition  segments  to 
packages  from  the  package  library.  HCP  utilizes  the  underlying  clustering  in  the  design  to  quickly  generate 
a  partition.  If  hcp  does  not  find  a  constraint  satisfying  solution,  mp  is  invoked.  MP  explores  a  larger  design 
space  by  constructing  a  class  of  partitions;  MP  returns  the  first  partition  that  satisfies  constraints,  or,  in 
the  absence  of  a  constraint  satisfying  solution,  returns  the  best  cost  solution  from  the  class  of  partitions. 
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Algorithm  4.2  (HPP  Algorithm:  HierPartPack) 

G:  input  graph  (Behavioral  specification/RTL  netlist),  P:  package  set 

C:  overall  cost  constraint  on  design,  HN:  hierarchical  netlist  manager 

StatArrfk],  BtkArrfk]:  status  of  partitioning  and  number  of  back-tracks  at  each  level 

MaxBtk:  User  specified  limit  on  number  of  back-tracks  at  any  level 

k:  levels  in  package  hierarchy,  level:  current  level,  area:  overall  area  constraint 

switch:  overall  switch  activity  constraint,  cost:  cost  constraint  at  current  package  level 

HierPartPack(G,  P,  C) 

begin 

level  *—  1  G <—  G  Solution  *—  null 
while  level  <  k  do 
Set-Constraint]) 

(EcpStatus,  EcpSolution)  <—  ECP(GieVei,  P (lev el),  cost,  area,  switch,  level) 
if  EcpStatus  ±  SUCC  then 

( status ,  Solution)  MP(Giev<-i,  P (level),  cost,  area,  switch,  level) 

end  if 

if  (status  ^  SUCC)  A  (cost(EcpSolution)  <  cost(MpSolution))  then 
(status,  Solution)  -f—  (EcpStatus,  EcpSolution) 

end  if 

StatArrfk]  <—  status 
case  status  is 

SUCC: 

level  level  +  1  EN  ::  read-partition  (Solution) 

EN ::  construct-netlist(level)  /*  construct  netlist  at  next  level  */ 
BEST: 

if  (StatArrflevel  -  1]  =  SUCC)  A  ( BtkArrfk ]  <  MaxBtk)  then 

BtkArrfk]  —  BtkArrfk]  +  1  level  *—  level  —  1  /*  back-track  */ 

else 

level  *-  level  +  1  EN ::  read-partition(Solution) 

EN ::  construct-netlist(level) 
end  if 
FAIL: 

if  (StatArrflevel  -  1]  —  SUCC)  A  ( BtkArrfk ]  <  MaxBtk)  then 

BtkArrfk]  «—  BtkArrfk]  +  1  level  *—  level  -  1  /*  back-track  */ 

else  return  ("null )  end  if 

end  case 

Gievei  ■*—  EN  ::  reacLnetlist (level)  /*  retrieve  next  level  netlist  */ 

end  while 
ret  urn  ( Solution ) 
end 
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Both  partitioning  algorithms,  HCP  and  MP,  return  a  status  along  with  a  solution  (partition  with  segments 
bound  to  packages).  Status  takes  three  values  of  SUCC,  BEST,  or  FAIL  to  describe  the  cases  where  a 
constraint  satisfying  solution  is  found  (a  constraint  satisfying  partition  with  partition  segments  mapped 
onto  packages  from  the  package  library),  a  solution  is  found  (valid  partition  -  a  partition  with  segments 
mapped  onto  packages),  or  no  solution  is  found  (no  valid  partition  -  one  or  more  partition  segments  cannot 
be  mapped  onto  packages). 

Status  is  used  to  decide  the  execution  flow  of  HPP.  If  the  status  of  partitioning  is  SUCC,  then  HPP  proceeds 
to  the  next  higher  level  of  packaging.  A  hierarchical  netlist  manager  (hn)  is  used  to  generate  a  netlist, 
of  the  newly  generated  partition,  for  use  at  the  next  higher  level.  If,  at  a  particular  level,  the  status  is 
BEST  or  FAIL,  and:  if  the  previous  level  partition's  status  is  SUCC,  HPP  back-tracks  to  the  previous  level 
and  generates  a  new  partition  with  tighter  cost  constraints;  if  the  previous  level  partition’s  status  is  BEST 
and  the  current  level  partition’s  status  is  BEST,  HPP  proceeds  to  the  next  higher  level  of  packaging;  if  the 
previous  level  partition’s  status  is  BEST  and  the  current  level  partition’s  status  is  FAIL,  HPP  terminates 
reporting  failure  to  find  a  solution,  hn  is  used  to  generate  the  netlist  for  partitioning.  On  a  recursive 
back-track,  back-tracking  continues  until  we  reach  a  level  where  the  status  of  partitioning  is  BEST.  When 
we  encounter  a  status  of  BEST,  we  cannot  do  any  better  and  the  back-track  stops,  and  the  algorithm 
proceeds  to  the  next  higher  level  of  packaging. 

Hierarchical  Cluster-based  partitioning  (HCP):  Hierarchical  clustering  is  the  partitioning  technique 
[14].  Algorithm  4.3  describes  HCP.  A  cluster  tree  for  the  input  graph  is  constructed  using  the  hierarchical 
clustering  approach.  The  hierarchical  clustering  algorithm  groups  a  set  of  objects  according  to  some 
measure  of  closeness  [14].  Two  closest  objects  are  clustered  first  and  considered  to  be  a  single  object  for 
future  clustering.  Clustering  continues  by  grouping  two  individual  objects,  or  an  object  or  cluster  with 
another  cluster  on  each  iteration.  The  process  stops  when  a  single  cluster  is  generated  and  a  hierarchical 
cluster  tree  is  formed.  Alternate  partitions  are  constructed  by  traversing  this  cluster  tree  and  moving  the 
cut-line  [14,  25,  23,  24].  Figure  4  shows  an  example  cluster  tree  and  the  different  cut-lines  and  associated 
partitions.  A  map  function  maps  partition  segments  to  available  packages  in  the  package  library.  Partition 
segments  and  the  entire  partition  are  then  checked  for  constraint  satisfaction.  A  sum  of  package  costs 
(for  all  partition  segments)  gives  the  cost  of  the  partition.  In  the  case  of  a  constraint  satisfying  solution 
(performance  and  cost),  the  solution  (partition)  is  returned  to  the  hierarchical  partitioning  algorithm  with 
status  SUCC.  In  the  case  of  a  solution  (valid  partition  with  partition  segments  mapped  onto  packages) 
that  does  not  satisfy  constraints,  a  status  BEST  is  returned.  When  no  solution  (no  valid  partition  -  one 
or  more  partition  segments  cannot  be  packaged)  is  found,  a  FAIL  is  returned. 

Multiway  Partitioning  Algorithm  (MP):  MP  (Algorithm  4.4)  is  built  on  top  of  the  Multiway  Fiducda- 
Mattheyses  algorithm  (mfm  —  Algorithm  4.5).  MP  first  determines  the  minimum  and  maximum  number 
of  segments  that  feasible  partitions  can  have  (the  partition  is  feasible,  i.e.,  there  may  exist  a  partition  such 
that  partition  segments  can  be  effectively  bound  to  packages  from  the  package  library).  The  minimum 
number  of  segments  (min^eg)  is  determined  by;  dividing  the  area  constraint  on  the  design  by  the  area 
capacity  of  the  largest  package;  dividing  the  switching  activity  constraint  on  the  design  by  the  switching 
activity  capacity  of  the  largest  package;  and  picking  the  larger  of  the  two.  The  maximum  number  of 
segments  ( maxseg )  is  determined  as  the  number  of  nodes  in  the  input  graph  (in  the  case  of  multicomponent 
synthesis,  the  number  of  processes  in  the  input  process  graph;  alternately,  a  user  specified  limit  on  the 
number  of  RTL  components  or  the  number  of  R.TL  components  in  the  case  of  an  RTt  netlist).  MP  invokes  mfm 
to  generate  partitions  with  number  of  segments  varying  from  min_seg  to  maxjseg.  MP  returns  with  status 
SUCC  if  a  constraint  satisfying  partition  is  found.  When  a  constraint  satisfying  solution  is  not  found,  MP 
returns  the  best  solution  found  with  status  BEST.  In  the  case  of  no  valid  partitions  (one  or  more  partition 
segments  cannot  be  packaged),  MP  returns  FAIL.  Algorithm  4.5  presents  the  modified  MFM  algorithm. 
MFM  repeatedly  calls  a  K-way  Ficuccia-Mattheyses  based  partitioning  algorithm  (kway  -  Algorithm  4.6) 
to  generate  partitions.  MFM  keeps  track  of  the  best  cost  partition,  mfm  returns  a  constraint  satisfying 
partition,  if  found,  or  the  best  cost  partition,  kway  determines  area,  switching  activity,  clock  speed,  and 
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Algorithm  4.3  (HCP  Algorithm) 

CD  :  depth  of  cluster  tree,  P  :  partition  at  current  level  in  cluster  tree 

S  :  segment  in  partition,  p  :  package  segment  is  mapped  to 

level :  level  in  the  package  hierarchy,  cost  :  cost  constraint  on  current  package  level 

area  :  overall  area  constraint,  switch  :  overall  switching  constraint 

HCP(level,  G,  PackageLib,  cost,  area,  switch) 

begin 

construct  cluster  tree  (T) 

best-cost  —  oo  Solution  *-  null  status  «—  FAIL  CD  *-  depth(T) 
for  tree-level  =  1  to  CD  do 
constraint-satisfied  TRUE 

for  each  5  6  P  do  /*  individual  partition  segment  constraints  */ 

if  level  =  1  then  /*  pure  behavior  specification  —  estimate  attributes  */ 
Schedule/Performance  Estimate  S  and  generate  A(S),  H(S),  B(S),  and  T(S) 
end  if 

p  *-  PackageLib  ::  map(S)  /*  get  package  segment  S  fits  on  */ 
if  (( p  #  null;  A  (A(S)  <  A(p))  A  (E(S)  <  H(p))  A  (B(S)  <  B(p))  A 

(T(S)>  T(p)))  then  constraint-satisfied  <—  constraint-satisfied  A  TRUE 
else  constraint-satisfied  *—  FALSE  end  if 
end  for 

if  constraint-satisfied  =  TRUE  then  /*  overall  design  constraints  */ 
if  ((cost(P)  <  cost)  A  (Area(P)  <  area)  A  (Switch(P)  <  switch))  then 
return  (SUCC,  P) 
elsif  cost(P)  <  cost  then 

Solution  *—  P  cost  *-  cost(P)  status  <—  BEST 
end  if 
end  if 
end  for 

return  (status,  Solution) 

end 
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Figure  4:  Example  Cluster  Tree  and  Partitions 

pin  constraints  from  packages  in  the  package  library  and  using  these  constraints  generates  a  partition  and 
maps  partition  segments  to  packages.  At  the  completion  of  kway,  the  algorithm  returns  a  partition  with 
segments  bound  to  packages  in  the  package  library. 

k-way  Fiduccia-Mattheyses  Algorithm  (KWAY):  The  k-way  m  algorithm  (kway  —  Algorithm  4.6) 
starts  by  creating  a  random  initial  partition  of  n  partition  segments.  Nodes  in  the  graph  are  randomly 
assigned  to  the  n  segments.  Each  segment  gets  some  nodes  from  the  set  of  vertices  V  of  the  graph  G.  The 
initial  partition  is  saved  in  Best.  Cost  of  this  partition  is  saved  as  best-cost,  k-way  partitioning  is  carried 
out  by  repeatedly  invoking  two-way  FM  ( twojway-fm )  on  pairs  of  partition  segments,  two.way-fm  tries  to 
improve  bi-partitions  by  moving  one  node  at  a  time  from  one  partition  segment  to  the  other,  taking  care 
not  to  violate  area  and  switching  activity  constraints.  The  two.way.fm  algorithm  is  based  on  Fiduccia 
and  Mattheyses’s  bi-partitioning  algorithm  [8].  two.way.fm  is  invoked  until,  either  a  user  specified  limit  on 
number  of  total  iterations  is  exceeded,  or  a  user  specified  limit  on  number  of  iterations  over  which  partition 
cost  does  not  improve  is  exceeded.  The  best  cost  solution  found  during  the  iterations  is  returned  as  the 
k-way  partition. 

Multicomponent  Synthesis:  Multicomponent  synthesis  is  carried  out  when  the  input  is  a  behavioral 
specification.  HCP  and  MP  algorithms  carry  out  multicomponent  synthesis  at  level-1  in  the  package  hierar¬ 
chy.  Multicomponent  synthesis  is  carried  out  by  synthesizing  individual  partition  segments  at  level- 1.  De¬ 
sign  tradeoffs  are  performed  by  considering  various  partitions  and  carrying  out  scheduling  and  performance 
estimation  on  proposed  partition  segments  and  determining  performance  attributes  of  the  synthesized  R.TL 
designs  and  determining  if  they  satisfy  capacity  and  cost  constraints  imposed  by  available  packages.  Also, 
a  global  controller  is  automatically  placed  on  a  partition  segment  and  interconnected  with  the  RTL  design 
segments.  The  global  controller  is  placed  on  a  partition  segment  whose  package  has  the  most  space  to  fit 
the  controller.  HCP  (Algorithm  4.3)  considers  different  partitions  by  traversing  the  cluster  tree  —  each 
level  in  the  cluster  tree  represents  a  different  partition  (see  Figure  4).  At  level-1,  every  time  a  new  partition 
is  considered  —  HCP  carries  out  scheduling  and  performance  estimation  on  each  of  the  proposed  partition 
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Algorithm  4.4  (Multiway  Partitioning  Algorithm:  MP) 

G:  input  graph,  P:  package  set,  p:  individual  package  from  P 
area:  overall  area  constraint,  switch:  overall  switch  activity  constraint 
C:  cost  constraint  on  design,  level:  level  in  package  hierarchy 

MP(G,  P,  C,  area,  switch,  level) 

begin 

min.seg  <—  max(area  -r  max-area(p),  switch  -r  maxswitch(p)) 
maxseg  —  num_cell( G)  /*  number  of  nodes  in  graph  */ 

best.cost  <—  oo  status  *-  FAIL  Solution  *—  null 
for  numseg  =  min.seg  to  max.seg  do 

(status,  TempSolution)  *—  MFM(G,  P,  numseg,  C,  area,  switch,  level) 
if  status  =  SUCC  then 

return  (status,  TempSolution) 

elsif  (status  =  BEST)  A  (cost(TempSolution)  <  best-cost)  then 
Solution  <—  TempSolution  best-cost  —  cost  (TempSolution) 

end  if 
end  for 

return  (status,  Solution) 

end 


segments  (to  compute  performance  attributes  of  the  R.TL  design)  and  then  tries  to  map  these  segments 
onto  packages  from  the  package  library.  Multicomponent  synthesis  in  MP  occurs  in  kway  (Algorithm  4.6). 
At  level-1  whenever  a  new  partition  is  constructed,  scheduling  and  performance  estimation  are  carried  out 
on  individual  partition  segments.  In  Algorithm  4.6  a  schedule/performance  estimate  step  is  carried  out 
when  the  initial  partition  is  generated  and  also  every  time  a  new  partition  is  generated.  By  scheduling 
and  performance  estimation,  we  predict  the  performance  characteristics  of  the  individual  synthesized  KTL 
designs  and  also  of  the  entire  multicomponent  design. 

At  the  end  of  multicomponent  synthesis  and  hierarchical  package  design  we  have  a  multicomponent  design 
composed  of  interacting  R.TL  design  segments  —  the  multicomponent  synthesis  phase  produces  multiple 
behavior  segments  that  are  completely  synthesized  to  R.TL  designs  using  a  high  level  synthesis  system  such 
as  dss  [40,  41].  Also  produced  is  a  hierarchical  structural  design  (the  leaf  nodes  in  this  design  are  the 
individual  RTL  designs)  that  is  mapped  onto  efficient  cost-effective  packages  from  a  package  library. 

An  Example:  We  illustrate  the  hpp  algorithm  (Algorithm  4.2)  through  an  example.  The  graph  in 
Figure  1  is  partitioned  onto  the  package  set  specified  in  Table  1  (to  generate  a  hierarchical  design  that  is 
mapped  onto  cost  effective  packages).  Hierarchical  partitioning  and  package  design  generates  a  package 
hierarchy  in  addition  to  a  multichip  design  for  the  input  specification. 

Let  the  user  specified  cost  constraint,  C,  be  S  5000.  First  the  overall  area  and  overall  switching  activity 
constraints  are  determined  from  the  capacity  of  the  smallest  package  at  the  highest  package  level  (since, 
eventually,  the  design  hierarchy  will  be  mapped  onto  a  package  at  the  highest  level  in  the  package  hierarchy) 
—  the  overall  area  and  switching  activity  constraints  are  set  from  the  area  and  switching  activity  capacities 
of  p7  at  level-3  which  has  an  area  capacity  of  60  sq  mm  and  switching  activity  capacity  of  5000.  The  cost 
constraint  on  level- 1  packaging  is  given  by  subtracting  the  projected  packaging  costs  at  levels  2  and  3  from 
C ,  i.e.,  by  subtracting  the  cost  of  the  smallest  packages  at  each  of  these  levels  from  C.  The  cost  constraint 
on  level-1  packaging  is  $  4550  (5000  -  200  -  250).  Set-Constraint  (Algorithm  4.1)  is  used  to  set  the  area 
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Algorithm  4.5  (Multiway  Fiduccia-Mattheyses  Algorithm:  MFM) 

G:  input  graph,  P:  package  set,  C:  cost  constraint  on  design 
A:  overall  area  constraint,  S:  overall  switching  activity  constraint 
n:  number  of  segments,  level:  level  in  package  hierarchy 

check-constraint(S) 

begin 

status  *—  BEST  toLarea  <—  0  tot-switch  <-  0  tot-cost  <—  0 

for  all  S{  €  S  do  /*  segments  in  partition  */ 

if  map(si)  =  null  then  /*  partition  segment  not  mapped  to  package  */ 
return  (FAIL) 
end  if 

tot-area  <—  tot-area  4-  area(map(s{)) 
tot-switch  *—  tot-switch  +  switch(si) 
tot-cost  *—  tot-cost  +  cost(map(si)) 

end  for 

if  (tot.area  <  A)  A  (tot-cost  <  C)  then 
status  <—  STJCC 
end  if 

ret  urn  (status) 
end 

MFM(G,  P,  n,  C,  A,  S,  level) 

begin 

Best  <—  KWAY(G,  P,  n,  level)  /*  generate  first  partition  */ 
num-fmJte  <—  1  num-fmJmp  <—  1  status  <—  check-constraint(Best) 
while  status  ^  SUCC  A  num.fmJte  <  MAX-FMJTE  A  num-fm-imp  <  MAX-FMJMP  do 
S  *—  KWAY(G,  P,  n,  level)  status  *-  check-constraint(S)  num-fm-ite  *—  num-fm.ite  +  1 
best-cost  <—  cost(Best) 

if  (status  =  SUCC)  V  ((status  —  BEST)  A  (cost(S)  <  best-cost))  then 
Best  <—  S 

end  if 

if  (cost(S)  <  best-cost)  then  num-fmJmp  <—  1 
else  num-fm-imp  <—  num-fm-imp  +  1  end  if 
end  while 
ret  urn  (status,  Best) 

end 
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Algorithm  4.6  (k-way  FM  Algorithm:  KWAY) 

G:  graph  G  =  (V,  E),  V  is  a  set  of  vertices  and  E  is  a  set  of  edges 
P:  set  of  packages,  S:  {sx,  s?,- --,$71}  a  partition  of  G  with  n  segments 

KWAY(G,  n,  level) 

begin 

Best  —  initializeQ  /*  create  initial  partitions  */ 

if  level  =  1  then  /*  pure  behavior  specification  —  estimate  attributes  */ 
for  all  s  €  Best  do 

Schedule/Performance  Estimate  s  and  generate  A(s),  H(s),  B(s ),  and  T(s) 

end  for 
end  if 

best-cost  *-  0  S  «—  null  conLpart  <—  TRUE  ite-cnt  <—  1  imp. cnt  <—  1 

for  all  5  €  Best  do  /*  map  partition  segment  to  package  and  find  cost  */ 
best-cost  *—  best-cost  +  cost(map(s )) 

end  for 

while  cont-part  =  TRUE  do 
for  i  =  1  to  n-1  do 
for  j  =  i+1  to  n  do 
two-way.  fm{$i,  sf) 

end  for 
end  for 

if  level  =  1  then  /*  pure  behavior  specification  —  estimate  attributes  */ 
for  all  s  €  S  do 

Schedule/Performance  Estimate  s  and  generate  A(s),  E(s),  B(s),  and  T(s) 

end  for 
end  if 

curr.cost «—  0 

for  all  s  6  S  do  /*  map  partition  segment  to  package  and  find  cost  */ 
curr.cost  *—  curr.cost  +  cost(map(s )) 

end  for 

ite.cnt  <—  ite.cnt  +  1 
if  curr.cost  <  best.cost  then 

imp.cnt  +—  1  Best  •*—  S  /*  save  best  partition  seen  so  far  */ 

else  imp.cnt  <—  imp.cnt  +  1  end  if 

if  ite.cnt  —  MAX.ITE  V  imp.cnt  =  IMP.CNT  then  cont.part  ■*—  FALSE  end  if 
end  while 

return  (Best)  /*  retrieve  best  partition  */ 

end 
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Actual  Cost  =  $3600 
Actual  Aroa  =  46  sq.  mm 
Actual  Switch  =  1750 


Figure  5:  Level-1  Partition  —  First  Pass 


and  switching  activity  constraints  on  the  entire  design  and  the  cost  constraint  for  level-1. 

Having  determined  the  cost,  area,  and  switching  activity  constraints  on  the  partition  at  level-1,  the  next 
step  is  to  construct  a  partition.  HPP  invokes  hcp  to  generate  a  partition  and  a  binding  of  its  partition 
segments  to  packages  from  the  package  library.  If  hcp  does  not  find  a  constraint  satisfying  solution,  mp 
is  invoked.  MP  first  determines  the  minimum  and  maximum  number  of  segments  ( minseg  and  maxseg). 
Feasible  partitions  with  3,  4,  and  5  segments  can  be  generated  for  the  design.  Partitions  with  1  and  2 
segments  are  not  feasible  because  no  package  at  level- 1  has  sufficient  area  or  switching  capacity.  After 
determining  the  minimum  and  maximum  number  of  segments  in  feasible  partitions,  MP  invokes  mfm  to 
generate  partitions  with  number  of  segments  varying  from  min-seg  to  max_seg.  mfm  calls  a  k-way  Fiduccia- 
Mattheyses  based  partitioning  algorithm  (kway-  Algorithm  4.6)  to  generate  partitions,  mfm  keeps  track 
of  the  best  cost  partition  and  returns  a  constraint  satisfying  partition,  if  found,  or  the  best  cost  partition. 

When  HPP  starts  the  process  of  hierarchical  partitioning  and  package  design  (entering  the  while  loop  in 
Algorithm  4.2),  it  invokes  MP  with  the  input  graph  (in  the  case  of  multicomponent  synthesis,  a  process 
graph;  alternately,  an  RTL  netlist),  a  set  of  packages  available  at  level-1,  a  cost  constraint  (S  4550),  an  area 
constraint  (60  sq  mm),  and  a  switching  activity  constraint  (5000).  Figure  5  illustrates  this  state  and  the 
level- 1  partition.  Partition  segments  are  marked  by  dashed  lines  and  the  packages  partition  segments  are 
mapped  onto  are  indicated  in  text  within  the  segments.  A  three  segment  partition  with  cost  $  3600,  area 
46  sq  mm,  and  switching  activity  1750  is  generated.  This  partition  satisfies  area,  switching  activity,  and 
cost  constraints  and  thus  MP  returns  a  SUCC  status.  HPP  then  uses  the  hierarchical  netlist  manager  (hn) 
to  read  the  generated  partition  and  construct  a  netlist  for  partitioning  at  level-2. 

Following  level-1  partitioning,  Set-Constraint  is  invoked  to  set  the  cost  constraint  for  the  level-2  partition. 
HPP  then  invokes  MP  with  the  new  netlist  (generated  from  the  level-1  partition),  the  set  of  packages  available 
at  level-2,  a  cost  constraint  (S  1200),  and  area  and  switching  activity  constraints  (60  sq  mm,  5000).  Figure  6 
illustrates  the  level-2  partition.  A  three  segment  partition  with  cost  S  1500,  area  52  sq  mm,  and  switching 
activity  1750  is  generated.  This  partition  satisfies  the  area  and  switching  activity  constraints,  but  violates 
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Actual  Costs  $1500 
Actual  Area  s  52  sq.  mm 
Actual  Switch  =  1750 


Figure  6:  Level-2  Partition  —  First  Pass 

the  cost  constraint,  thus  MP  returns  a  BEST  status,  hpp  now  back-tracks  to  level-1  and  starts  the  second 
pass  (a  new  pass  starts  every  time  we  back-track  to  level-1). 

On  the  back-track  hpp  tightens  the  cost  constraint  using  Set-Constraint.  Assuming  a  cost  overrun  factor 
(COF)  of  1,  the  new  cost  constraint  for  the  level-1  partition  is  $  3300  (as  computed  by  Set-Constraint). 
hpp  re-invokes  MP  on  the  R.TL  netlist  (area  and  switching  activity  constraints  stay  the  samp  and  the  set 
of  packages  available  at  level-1  stays  the  same).  Figure  7  shows  the  new  level-1  partition.  A  four  segment 
partition  with  cost  S  3300,  area  48  sq  mm,  and  switching  activity  1750  is  generated.  This  partition  satisfies 
area,  switching  activity,  and  cost  constraints  and  mp  returns  a  SUCC  status,  hpp  then  generates  a  new 
netlist  using  HN  for  level-2. 

For  the  level-2  partition  hpp  invokes  MP  with  the  new  netlist  and  a  cost  constraint  of  S  1500.  Figure  8 
shows  the  second  pass  level-2  partition.  A  three  segment  partition  with  cost  S  1500,  area  52  sq  mm, 
and  switching  activity  1750  is  generated.  MP  returns  a  SUCC  status  as  area,  switching  activity,  and  cost 
constraints  are  satisfied,  hpp  uses  HN  to  generate  a  netlist  for  level-3. 

A  cost  constraint  of  $  200,  an  area  constraint  of  60  sq  mm,  a  switching  activity  constraint  of  1750,  and  a  pin 
constraint  of  75  are  considered  for  the  level-3  partition.  Figure  9  shows  the  second  pass  level-3  partition. 
A  one  segment  partition  with  cost  S  400,  area  60  sq  mm,  and  switching  activity  1750  is  generated.  MP 
returns  a  BEST  status  as  the  cost  constraint  is  not  satisfied,  hpp  now  back-tracks  to  level-2. 

At  level-2  hpp  invokes  MP  with  a  cost  constraint  of  8  1300  (as  determined  by  Set-Constraint).  Figure  10 
shows  the  second  pass  level-2  partition  on  a  back-track.  A  three  segment  partition  with  cost  8  1500  is 
generated,  mp  returns  a  BEST  status  as  the  cost  constraint  is  not  satisfied,  hpp  now  back-tracks  to  level-1 
and  begins  the  third  pass. 

The  third  complete  pass  begins  at  level-1  with  a  cost  constraint  of  S  3100.  Figure  11  shows  the  third  pass 
level- 1  partition.  A  five  segment  partition  with  cost  8  3000,  area  50  sq  mm,  and  switching  activity  1750  is 
generated,  mp  returns  a  SUCC  and  hpp  proceeds  to  level-2. 
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Area  Constraint  s  60  sq.  mm 
Switch  Constraint  =  5000 

Actual  Cost  =  $3300 
Actual  Area  s  48  sq.  mm 
Actual  Switch  =  1750 


Figure  7:  Level-1  Partition  —  Second  Pass  (Back-track) 
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Overall  Cost  Constraint  =  $  5000 
Level-2  Cost  Constraint  =  $1500 
Area  Constraint  =  60  sq.  mm 
Switch  Constraint  s  5000 

Actual  Cost  =  $  1500 
Actual  Area  =  52  sq.  mm 
Actual  Switch  s  1750 


Figure  8:  Level-2  Partition  —  Second  Pass 
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Area  Constraint  s  60  sq.  mm 
Switch  Constraint = 5000 

Actual  Cost  =  $400 
Actual  Area  =  60  sq.  mm 
Actual  Switch  =  1750 


Figure  9:  Level-3  Partition  —  Second  Pass 
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Level-2  Cost  Constraint  =  $  1300 
Area  Constraint  =  60  sq.  mm 
Switch  Constraint  =  5000 

Actual  Cost  =  $  1500 
Actual  Area  =  52  sq.  mm 
Actual  Switch  s  1750 


Figure  10:  Level-2  Partition  —  Second  Pass  (Back-track) 
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Level-1  Cost  Constraint  =  $  3100 
Area  Constraint  =  60  sq.  mm 
Switch  Constraint  =  5000 

Actual  Cost  =  $  3000 
Actual  Area  =  50  sq.  mm 
Actual  Switch  =  1750 


Figure  11:  Level- 1  Partition  —  Third  Pass 


HPP  invokes  MP  with  a  cost  constraint  of  S  1600  for  the  third  pass  level-2  partition.  Figure  12  shows  the 
third  pass  level-2  partition.  A  three  segment  partition  with  cost  S  1500,  area  52  sq  mm,  and  switching 
activity  1750  is  generated.  MP  returns  a  STJCC  and  HPP  proceeds  to  level-3. 

At  level-3  MP  is  invoked  with  a  cost  constraint  of  S  500.  Figure  13  shows  the  third  pass  level-3  partition. 
A  one  segment  partition  with  cost  $  400,  area  60  sq  mm,  and  switching  activity  1750  is  generated.  MP 
returns  a  STJCC.  This  exhausts  the  package  hierarchy,  since  there  is  no  level-4  in  the  package  library. 

At  this  point  HPP  terminates  the  hierarchical  partitioning  process  and  returns  the  hierarchical  design  along 
with  the  generated  package  hierarchy  (Figures  11,12,  and  13).  The  input  rtl  design  has  been  successfully 
mapped  onto  a  hierarchy  of  packages  and  a  constraint  satisfying  solution  has  been  found.  The  overall  cost 
constraint  of  $  5000  on  the  design  has  been  satisfied  by  finding  a  solution  with  cost  $  4900.  At  each  level 
in  the  package  hierarchy,  partition  segments  have  been  mapped  onto  available  packages  making  sure  that 
capacity  constraints  of  the  packages  are  satisfied. 

Discussion:  In  the  above  example,  one  of  the  cases  we  did  not  see  in  HPP  is  when  a  FAIL  status  is 
returned  by  MP.  A  FAIL  status  indicates  that  no  valid  partition  for  the  design  exists  at  this  level  (i.e., 
for  all  feasible  partitions  at  least  one  of  the  partition  segments  could  not  be  mapped  onto  a  package  at 
this  level  in  the  package  hierarchy).  At  this  point  HPP  checks  if  the  status  of  the  previous  level’s  partition 
was  a  SUCC  and,  if  it  is,  HPP  back-tracks.  STJCC  at  the  previous  level  indicates  that  there  is  room  for 
improvement  at  the  previous  level  and  hence  the  possibility  of  a  valid  solution  at  this  level  (as  a  result 
of  improvement  at  the  previous  level).  If  the  status  of  the  previous  level’s  partition  is  BEST,  there  is  no 
room  for  improvement  at  the  previous  level  and  hpp  terminates  reporting  failure  to  find  a  solution. 

Another  case  we  did  not  observe  is  what  happens  when  a  BEST  is  returned  (by  MP)  at  level-1.  When 
a  BEST  is  also  returned  at  level-2,  HPP  continues  with  the  partitioning  process  up  the  hierarchy.  No 
back-track  is  attempted  because  a  status  of  BEST  at  level- 1  indicates  that  the  partition  returned  is  the 
best  cost  solution  found  and  cannot  be  improved.  If  a  STJCC  is  returned  at  level-2,  HPP  could  potentially 
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Area  Constraint  =  60  sq.  mm 
Switch  Constraint  =  5000 

Actual  Cost  =  $1500 
Actual  Area  =  52  sq.  mm 
Actual  Switch  =  1750 


Figure  12:  Level-2  Partition  —  Third  Pass 


Area  Constraint  =  60  sq.  mm 
Switch  Constraint  =  5000 

Actual  Cost  =  $  400 
Actual  Area  =  60  sq.  mm 
Actual  Switch  =  1750 


Figure  13:  Level-3  Partition  —  Third  Pass 
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back-track  to  level-2  if  level-3  returns  a  BEST  or  FAIL.  If  a  FAIL  is  returned  at  level-2,  HPP  terminates  the 
partitioning  reporting  failure  to  find  a  solution.  However,  in  all  cases,  there  is  a  possibility  that  an  inferior 
solution  at  level- 1  (a  solution  other  than  the  best  cost  solution)  could  lead  to  a  better  overall  solution. 
But  due  to  the  nature  of  costs  in  vlsi  packaging,  the  highest  costs  are  incurred  primarily  at  level-1  and 
for  some  advanced  high  performance  packaging  technology  such  as  MCMs  at  level-2,  it  is  very  unlikely  that 
an  inferior  solution  at  level- 1  could  lead  to  an  overall  better  solution. 

The  user  controls  the  amount  of  back-tracking  by  setting  MaxBtk.  Another  way  the  user  can  control  the 
amount  of  back-tracking  is  by  setting  initial  cost  constraint  as  zero.  When  the  initial  cost  constraint  is 
zero,  mp  is  constrained  to  always  look  for  the  best  cost  solution  (status  BEST)  at  all  levels  in  the  package 
hierarchy.  Typically,  we  find  that  the  solution  converges  very  quickly  and  we  only  back-track  2-3  times 
(see  Section  5). 


5  Results 

We  present  results  for  a  number  of  examples  to  demonstrate  the  validity  of  our  approach  for  multicomponent 
synthesis  and  hierarchical  package  design.  Details  of  the  package  library  are  shown  in  Table  2.  Data  about 
area,  pin,  switching  activity,  and  clock  speed  constraints  supported  by  each  package  and  package  cost  are 
presented.  We  briefly  describe  the  example  behavioral  specifications.  Table  3  presents  some  details  on  the 
number  of  lines  of  code  and  number  of  processes  for  each  of  our  examples. 

Move  Machine:  The  Move  Machine  was  suggested  by  Ivan  Sutherland  based  on  the  observation  that 
conventional  processing  units  spend  much  of  their  time  moving  arguments  from  memory  to  CPU  and 
moving  results  from  CPU  to  memory.  The  instruction  set  of  the  Move  Machine  merely  controls  instruction 
and  data  flow;  it  does  not  compute  any  data  values.  Instead,  certain  memory  locations  are  (assumed  to 
be)  connected  to  external  computational  units  which  perform  the  actual  computations.  Paul  Drongowski 
provided  an  ISPS  description  of  a  Move  Machine  in  [5].  The  VHDL  for  this  example  was  written  by  Jay  Roy 
[48,  41].  The  description  consists  of  three  vhdl  processes  (fetch,  decode  and  execute)  and  several  internal 
variables,  signals  and  ‘wait’  statements. 

Fifo:  A  producer-consumer  problem  description  written  using  three  communicating  processes  (PRO¬ 
DUCER,  CONSUMER,  and  FIFO).  It  has  five  input  ports  (enqueue,  dequeue,  a,  b,  and  c)  and  four 
output  ports  (data_ready,  z,  overflow,  and  underflow).  AH  data  signals  and  ports' (a,  b,  c,  and  z)  are  four 
bits  wide,  and  all  controls  (enqueue,  dequeue,  data_ready,  overflow,  and  underflow)  are  one  bit  signals 
When  enqueue  goes  high,  values  in  ports  a  and  6  are  added  and  stored  in  the  queue.  When  dequeue  goes 
high,  value  in  port  c  is  subtracted  from  the  topmost  element  in  the  queue  and  the  result  is  output  to  z. 
The  queue  has  a  depth  of  10.  If  more  than  10  values  are  stored  in  the  queue,  overflow  goes  high.  Similarly, 
an  attempt  to  read  a  value  from  an  empty  queue  results  in  underflow  going  high. 

Shuffle:  The  Shuffle  is  a  high  speed  reconfigurable  32  bit  shuffle-exchange  network  for  parallel  signal 
processing.  The  Shuffle  exchange  is  a  commercial  product  that  Texas  Instruments,  Inc.  (TI)  used  to 
manufacture  (TI  part  SN/4AS8839)  [33].  The  shuffle-exchange  network  has  a  four  level  architecture  that 
supports  five  types  of  multiplexed  data  permutations:  (1)  forward  shuffle:  (2)  inverse  shuffle;  (3)  upper 
broadcast;  (4)  lower  broadcast;  and  (5)  bit  exchange.  A  seven  bit  control  word  determines  the  type  of 
permutation.  Additional  control  is  provided  with  a  two  bit  output  selector  which  determines  if  the  output 
should  be  composed  of:  (1)  all  l’s;  (2)  the  lower  16  bits  -  result  of  4-level  shuffle  and  upper  16  bits  -  all 
I’s;  (3)  the  lower  16  bits  -  all  l’s  and  upper  16  bits  -  result  of  4-level  shuffle,  and;  (4)  all  32  bits  -  result 
of  4-level  shuffle.  The  shuffle  exchange  is  modeled  as  a  five  process  description  —  each  of  the  four  levels  of 
shuffle  and  the  output  control  are  modeled  as  separate  processes. 
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Level  !  Name 


Area  (sq.  mm) 


5 


5 


8 


12 


15 


Node  Switches 


50000 


60000 


80000 


120000 


150000 


200000 


200000 


200000 


300000 


400000 


500000 


800000 


1000000 


50000 


60000 


80000 


120000 


150000 


200000 


250000 


300000 


Speed  (ns) 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


50 


75 


75 


75 


100 


100 


100 


100 


Cost  (8) 


400 


500 


600 


700 


800 


900 


1000 


1200 


1300 


1400 


1500 


1600 


1800 


250 


300 


350 


400 


450 


500 


550 


600 


800 


900 


1000 


1200 


1500 


10000 


15000 


20000 


300 


400 


500 


600 


800 


1200 


Tinyl 


1  I  Tiny4 


i  Smalll 


1  PGA-1 


1  PGA-2 


2 

Pl-5 

2 

Cer-1 

2 

Cer-2 

Cer-3 


2 

PGA-1C 

12 

2 

PGA-2C 

15 

2 

PGA-3C 

18 

2 

PGA-4C 

20 

2 

PGA-5C 

20 

2 

MCM-1 

200 

2 

MCM-2 

300 

2 

MCM-3 

400 

3 

Board- 1 

300 

3 

Board-2 

400 

3 

Board-3 

500 

3 

Board-4 

600 

3 

Board-5 

800 

3 

Board-6 

1000 

2000000 


3000000 


4000000 


5000000 


Table  2:  Package  Alternatives 
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dynl— dynlO:  dyn  is  a  five  process  description  that  monitors  and  maintains  the  dynamic  length  and 
maximum  length  to  which  a  queue  in  a  producer-consumer  problem  grows,  enqueue  and  dequeue  are  used 
to  trigger  computation  of  length  and  maxJength  of  the  queue,  dyn  uses  four  processes  to  check  for  settling 
of  values  on  enqueue,  dequeue,  length,  and  maxJength.  The  fifth  process  uses  a  procedure  to  compute 
length  of  the  queue  depending  on  enqueue  or  dequeue  and  then  computes  maxJength.  dynl— dynlO  are 
generated  by  making  multiple  instantiations  (1-10)  of  the  basic  five  process  description  of  dyn. 

alul-alu5:  alu  is  a  nine  process  description  of  an  arithmetic  and  logic  unit  (alu).  Eight  processes  carry 
out  arithmetic  and  logical  operations  on  a  pair  of  4  bit  inputs.  The  ninth  process  uses  a  3  bit  function 
select  to  determine  the  appropriate  function  (which  arithmetic  or  logical  operation)  result  to  be  output, 
alul-aluo  are  generated  by  making  multiple  instantiations  (1-5)  of  the  basic  nine  process  description  of 
alu. 

We  first  present  results  for  hierarchical  rtl  partitioning  and  multicomponent  synthesis  and  hierarchical 
package  design  separately,  and  then  compare  the  results  of  the  two  approaches.  Switching  activity  con¬ 
straints  are  not  considered  in  hierarchical  rtl  partitioning  and  package  design. 


Example 

Num  Lines  (vhdl) 

Num  Proc 

Mv  Me 

75 

3 

Fifo 

65 

3 

Shuffle 

472 

5 

dynl 

132 

5 

dyn2 

254 

10 

dyn3 

376 

15 

dyn4 

498 

20 

dyn5 

620 

25 

dyn6 

742 

30 

dyn7 

864 

35 

dyn8 

986 

40 

dyn9 

1108 

45 

dynlO 

1230 

50 

alul 

100 

9 

alu2 

188 

18 

alu3 

276 

27 

alu4 

364 

36 

alu5 

452 

45 

Table  3:  Design  Data  for  Examples 


5.1  Hierarchical  RTL  Partitioning 


Table  4  presents  experimental  results  for  the  hierarchical  rtl  partitioning  approach  for  the  above  examples. 
Number  of  rtl  components  in  the  netlist,  mapping  of  partition  segments  to  packages  from  the  package 
library,  cost  of  the  hierarchical  partition  (cost  of  packages  partition  segments  are  mapped  onto)  and  cost 
constraint,  and  execution  time  for  the  designs  are  presented. 

We  did  not  run  the  alu  or  dyn  examples  with  more  instantiations  (larger  example  sizes)  because  execution 
times  for  RTL  netlists  of  the  larger  examples  shown  in  Table  4  are  of  the  order  of  30  hours.  These  examples 
show  a  very  quick  rise  in  execution  times  with  increase  in  design  sizes  (in  terms  of  RTL  components  in  the 
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Example 

No.  of 
Comps 

Segments  and  Mapping 
(■Si-Pi) 

Cost /Constraint 
{%) 

Execution 

Time 

Level- 1 

Level-2 

Level-3 

sn-Pl-1 
Si2— PGA-oC 

soi-Board-l 

4250/5000 

13.2  s 

alul 

65 

$i-Small3 

$n-Cer-3 

52i~Board-l 

1900/2500 

6.5  s 

Fifo 

76 

Si-Small2 

Sn-Cer-2 

S2i~Board-l 

1750/3000 

6.4  s  | 

dynl 

128 

Sn-Pl-5 

S2i-Board- 1 

1550/2000 

11.9  s 

alu2 

123 

Si-PGA-3 

s2-PGA-4 

$n-PGA-3C 

S12-PGA-4C 

521-Board- 1 

5400/5000 

49  min  36  s 

alu3 

161 

$i-PGA-6 
$2~ PGA-6 
53-PGA-6 
s4-Xinyl 

sii*~PGA-5C 

si2~PGA-5C 

S13-PGA-5C 

S14-PI-I 

521-Board-l 

10850/8000 

1  hr  44  min 

dyn2 

234 

si-Tinyl 

S2~Tinyl 

53- PGA-4 

54- PGA-I 

Su-PGA-4C 

S12-PGA-IC 

Sis-Pl-4 

521-Board- 1 

6200/3200 

1  hr  49  min 

alu4 

205 

23  segments 

Sn-MCM-3 

52i~Board-2 

53600/15000 

30  hr  31  min 

dyn3 

334 

21  segments 

sn-MCM-3 

52i~Board-2 

53000/3300 

31  hr  28  min 

Table  4:  Hierarchical  Partitioning  Results 


Note:  s-p  denotes  the  mapping  of  segment  5  onto  package  p  from  the  package  library. 

design).  We  did  not  fully  observe  the  effect  of  back-tracking  on  these  examples  because  of  the  rapidity 
with  which  the  execution  times  increased. 


5.2  Multicomponent  Synthesis  and  Hierarchical  Package  Design 

Tables  5,  6,  and  7  present  results  of  multicomponent  synthesis  and  hierarchical  package  design  for  the  design 
examples  in  Table  3  with  the  package  library  shown  in  Table  2.  For  each  example  Table  5  presents:  (1) 
number  of  processes;  (2)  hierarchical  partition  segments  mapped  onto  packages  from  the  package  library 
(at  level-1,  partitioning  of  processes  (synthesized  into  equivalent  rtl  designs)  into  partition  segments); 
(3)  actual  number  of  back- tracks  by  the  hierarchical  partitioning  and  package  design  algorithm  and  the 
limit  on  number  of  back-tracks  (BTK);  (4)  actual  cost  of  the  design  and  the  cost  constraint:  and  (5) 
execution  time.  With  a  larger  number  of  processes  it  is  difficult  to  present  assignment  of  processes  to 
partition  segments.  Table  6  presents  the  number  of  processes  on  each  level-1  partition  (instead  of  presenting 
individual  partitions).  With  an  even  larger  number  of  processes,  it  is  difficult  to  present  even  details  of 
level-2  partition  segments.  Thus,  Table  7  presents  the  following  data  for  all  designs  in  Table  3:  (1)  number 
of  processes;  (2)  number  of  back-tracks/BTK;  (3)  actual  cost/constraint;  and  (4)  execution  time. 

We  have  presented  results  on  multicomponent  synthesis  and  hierarchical  package  design  and  hierarchical 
RTL  partitioning  and  package  design.  All  these  results  establish  and  reinforce  the  validity  of  our  approach. 
An  interesting  observation  that  vindicates  our  choice  of  the  back- tracking  algorithm  is  that  in  all  our 
examples  the  most  times  we  ever  back-track  is  three,  in  the  case  of  the  alu4  example  (Table  7).  This  is 
because  the  algorithm  back-tracks  only  if  it  can  potentially  find  a  solution  with  better  cost  and,  also,  the 
algorithm  focuses  in  on  a  solution  very  rapidly. 
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Example 

No.  of 
Procs 

Segments  and  Mapping 
(Si-p;) 

Cost/ 

Constraint 

(S) 

Exec 

Time 

(s) 

Level- 3 

Level-2 

WEBESm 

Mv  Me 

3 

$2i -Board- 1 

PGA-5C 

$i— PGA-6 
EXE 

1/10 

5600/5000 

6 

$X2-PGA-1C 

$2-PGA-l 
FET,  DEC 

Fife 

3 

S2i~Board-l 

$n-Pl-5 

$i-Smalll 

FIFO 

PRODUCER 

CONSUMER 

0/10 

1550/3000 

2.7 

Shuffle 

5 

$2i-Board-2 

$ii— PGA-4C 

si-PGA-4 
shuffle- 1 

0/10 

13900/12000 

59.8 

$i2-PGA-4C 

$2-PGA-4 

shuffle-2 

5i3~PGA-4C 

$3-PGA-4 

shuffle-3 

$14— PGA-4C 

$4— PGA-4 
shuffle-4 

$15-PGA-4C 

$s-PGA-4 

output 

dynl 

5 

S2i~ Board- 1 

$n-Cer-3 

$i-Sma!13 

sl_p_l,sl_p_pt 

sl_p_sl,sl_p_2 

sl_p_st 

1/10 

1900/2000 

3.6 

alul 

9 

$21 -Board- 1 

$n~Cer-2 

si-PGA-1 

sl_nbp,sl_nap 

sl_np,sl_outp 

1/10 

3100/2500 

$2-Tinyl 

sl_mp,sl_ap 

sl_op 

$12”PH 

$3-Tinyl 

sl_dp,sl_sp 

dyn2 

10 

$2i~Board-l 

$n“ Cer-3 

$1 -Small- 1 
s2_p_sl,s2_p_pt 
s2_p_2 

2/10 

3350/3200 

212.7 

$2~Tinyl 

s2_p_st,sl_p_st 

$i2-Pl-5 

$3-Smalll 
sl_p_sl,sl_p_pt 
sl p l,sl p 2 

Table  5:  Multicomponent  Synthesis  with  Hierarchical  Package  Design  Results 


Note:  s-p  denotes  the  mapping  of  segment  s  onto  package  p  from  the  package  library.  Also,  at  level-1, 
mapping  of  processes  to  partition  segments  is  presented. 
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Example 

No.  of 
Procs 

Segments  and  Mapping 

{Si-Pi) 

Num 

BkTrk/ 

BTK 

Cost/ 

Constraint 

(S) 

Level-3 

Level-2 

Level- 1 

dyn3 

15 

S2i~Board-l 

sn-Pl-3 

Si-Tiny-3 

3  procs 

1/10 

5000/5000 

126.1 

S12-PI-0 

s2-Smalll 

4  procs 

5i3~Pl-5 

S3-Smalll 

4  procs 

S14-P1-0 

s4-Smalll 

4  procs 

alu2 

18 

S21 -Board- 1 

sn— PGA-3C 

•Si-PGA-3 

6  procs 

1/10 

6700/5000 

412.8 

S12-P1-0 

52-Smalll 

5  procs 

5i3*”PGA-2C 

s3-Tinyl 

1  proc 

s4-Tinyl 

3  procs 

s5-Tinyl 

2  procs 

S6-Tinyl 

1  proc 

dyn4 

S2i~Board-l 

sn-Pl-5 

Si-Smalll 

5  procs  I 

0/10 

6350/8000 

229.3 

S2~Tinyl 

1  proc 

3i3~Cer-2 

S3-Small2 

6  procs 

s14-Pl-3 

s4-Tiny3 

3  procs 

s15-P1-4 

ss~Tiny4 

4  procs 

S6-Tinvl 

1  proc 

Table  6:  Multicomponent  Synthesis  and  Package  Design  Results  (Contd  ...) 


Note:  s-p  denotes  the  mapping  of  segment  s  onto  package  p  from  the  package  library.  Also,  at  level-1, 
number  of  processes  on  each  partition  segment  are  presented. 
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Example 

No.  of 
Procs 

Num  BkTrk/ 
BTK 

Cost/Constraint 

(S) 

Exec 
Time  (s) 

Mv  Me 

3 

1/10 

5600/5000 

6 

Fifo 

3 

0/10 

1550/3000 

2.7 

Shuffle 

5 

0/10 

13900/12000 

59.8 

dynl 

5 

1/10 

1900/2000 

3.6 

alul 

9 

1/10 

3100/2500 

100.7 

dyn2 

10 

2/10 

3350/3200 

212.7 

dyn3 

15 

1/10 

5000/5000 

126.1 

alu2 

18 

1/10 

6700/5000 

412.8 

dyn4 

20 

0/10 

6350/8000 

229.3 

dyn5 

25 

0/10 

8350/8000 

349.5 

alu3 

27 

0/10 

12700/8000 

579 

dyn6 

30 

1/10 

9850/9000 

1470.7 

dyn7 

35 

2/10 

11200/10000 

3141 

alu4 

36 

3/10 

14100/15000 

1549.4 

dyn8 

40 

1/10 

11850/12000 

1863.5 

dyn9 

45 

1/10 

13800/13000 

3684.1 

alu5 

45 

2/10 

17750/18000 

1626.4 

clynlO 

50 

2/10 

16850/15000 

6452.2 

Table  7:  Multicomponent  Synthesis  and  Package  Design  Results  (Contd  ...) 


5.3  Multicomponent  Synthesis  vs  Hierarchical  RTL  Partitioning 


Table  8  presents  a  comparison  of  multicomponent  synthesis  and  hierarchical  package  design  and  hierarchical 
RTL  partitioning.  The  following  information  is  presented  for  each  example:  (1)  number  of  processes  in  the 
behavioral  description;  (2)  number  of  rtl  components  in  a  single-chip  synthesized  design;  (3)  number  of 
back-tracks/limit  on  back-tracks,  cost  of  packaging  design,  and  execution  time  for  (a)  multicomponent 
synthesis  and  (b)  RTL  partitioning;  and  (4)  dollar  cost  constraint  for  the  design.  For  each  example,  the 
better  dollar  cost  solution  is  bold-faced,  rtl  partitioning  yields  better  designs  for  smaller  examples  where 
the  number  of  synthesized  rtl  components  is  relatively  small  (<  200).  For  larger  examples  multicomponent 
synthesis  clearly  out-performs  rtl  partitioning  in  the  quality  of  solutions.  Also,  the  time  taken  by  rtl 
partitioning  is  more  than  the  time  taken  by  multicomponent  synthesis  by  an  order  of  magnitude  (two 
orders  or  magnitude  for  larger  examples  -  e.g.,  alu4,  dyn3). 


5.4  Functional  Validation 

We  have  presented  results  on  the  performance  of  the  multicomponent  synthesis  and  hierarchical  package 
design  algorithm  (hpp  Algorithm  4.2)  for  multicomponent  synthesis  with  hierarchical  package  design 
and  hierarchical  RTL  partitioning  and  package  design  for  a  number  of  examples.  Another  important  step 
is  to  functionally  validate  the  designs  produced.  The  output  of  hierarchical  partitioning  and  package 
design  comprises:  (1)  in  the  case  of  multicomponent  synthesis,  a  set  of  behaviors  (vhdl)  (corresponding  to 
individual  register  level  segments  that  together  constitute  the  multicomponent  design)  to  be  synthesized 
into  equivalent  rtl  designs  using  a  high  level  synthesis  system  such  as  dss  [40,  41];  alternately,  a  set  of 
RTL  design  segments  in  the  case  of  RTL  netlists;  (2)  a  set  of  structures  (vhdl)  that  realizes  the  hierarchical 
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Example 

Num 

Proc 

Num 

RTL 

Comp 

Multicomponent 

Synthesis 

Hierarchical 

RTL  Partitioning 

Cost  ($) 
Constr. 

Btk/ 

BTK 

Cost 

($) 

Exec 
Time  (s) 

Btk/ 

BTK 

Cost 

($) 

Exec 
Time  (s) 

Mv  Me 

3 

53 

1/10 

5600 

6 

0/10 

4250 

13.2 

5000 

Fifo 

3 

76 

0/10 

1550 

2.7 

0/10 

1750 

6.4 

3000 

Shuffle 

5 

379 

0/10 

13900 

59.8 

- 

- 

- 

12000 

dynl 

5 

128 

1/10 

1900 

3.6 

0/10 

1550 

11.9 

2000 

alul 

9 

65 

1/10 

3100 

100.7 

0/10 

1900 

6.5 

2500 

dyn2 

10 

234 

2/10 

3350 

212.7 

0/10 

6200 

6560 

3200 

dyn3 

15 

334 

1/10 

5000 

126.1 

0/10 

53000 

113272 

5000 

alu2 

18 

123 

1/10 

6700 

412.8 

0/10 

5400 

2976 

5000 

dyn4 

20 

- 

0/10 

6350 

229.3 

- 

- 

- 

8000 

dyn5 

25 

- 

0/10 

8350 

349.5 

- 

- 

- 

8000 

alu3 

27 

161 

0/10 

12700 

579 

0/10 

10850 

6251 

8000 

dyn6 

30 

- 

1/10 

9850 

1470.7 

- 

- 

- 

9000 

dyn7 

35 

- 

2/10 

11200 

3141 

- 

- 

- 

10000 

alu4 

36 

205 

3/10 

14100 

1549.4 

0/10 

53600 

109850 

15000 

dyn8 

40 

- 

1/10 

11850 

1863.5 

- 

- 

12000 

dyn9 

45 

- 

1/10 

13800 

3684.1 

- 

- 

- 

13000 

alu5 

45 

- 

2/10 

17750 

1626.4 

- 

- 

j 

18000 

dynlO 

50 

- 

2/10 

16850 

6452.2 

- 

- 

- 

15000 

Table  8:  Multicomponent  Synthesis  vs  Hierarchical  RTL  Partitioning 


multicomponent  design;  and  (3)  a  binding  of  behaviors  (rtl  segments)  and  structures  to  appropriate 
packages  from  the  package  library  at  each  level  of  the  package  and  design  hierarchy.  From  the  viewpoint 
of  functional  validation  (1)  and  (2)  are  of  importance.  The  functional  validation  approach  consists  of:  (1) 
synthesizing  register  level  designs  from  the  behavior  segments  using  a  high  level  synthesis  system  such  as 
Dss(in  the  case  of  multicomponent  synthesis);  and  (2)  simulating  the  multicomponent  design  in  vhdl  using 
the  same  characteristic  set  of  test  vectors  used  for  validating  the  behavioral  specification  (see  Section  3 
—  profiling  stimuli).  We  have  functionally  validated  the  Move  Machine,  Fifo,  and  Shuffle  examples  by 
simulating  the  output  multicomponent  designs  in  vhdl  (the  other  examples  —  alul-alulO,  dynl-dynlO  — - 
are  synthetic  and  are  used  for  illustrating  the  capability  of  the  multicomponent  synthesis  and  hierarchical 
package  design  algorithm  to  handle  large  designs).  In  addition  to  functionally  validating  these  designs 
at  the  vhdl  level,  we  have  validated  the  shuffle  exchange  example  at  the  layout  level  by  switch  level 
simulation.  We  generated  the  layout  of  the  hierarchical  design  using  the  Lager  IV  silicon  compiler  [22].  We 
extracted  switch  level  models  from  the  layouts  and  simulated  the  switch  level  model  using  IRSIM,  a  switch 
level  simulator. 


6  Conclusions  and  Discussion 


We  have  presented  a  generic  hierarchical  partitioning  and  package  design  technique  for  multichip  designs.  It 
takes  a  generic  graph  specification  (in  the  case  of  multicomponent  synthesis,  a  process  graph;  alternately,  an 
RTL  netlist  in  the  case  of  RTL  partitioning),  a  set  of  available  packaging  options,  an  overall  cost  constraint  on 
the  design  and  generates  a  multichip  design  while  simultaneously  constructing  a  physical  package  hierarchy 
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for  the  design.  We  have  demonstrated  two  applications  of  this  generic  technique  for  multichip  design:  (1) 
hierarchical  RTL  partitioning  and  (2)  multicomponent  synthesis  with  hierarchical  package  design. 

We  have  presented  results  for  both  approaches  and  also  compared  the  performance  of  the  approaches  with 
respect  to  the  quality  of  designs  produced  and  execution  times  for  a  number  of  typical  design  examples,  rtl 
partitioning  and  package  design  yields  good  results  for  examples  where  the  number  of  rtl  components  in 
the  synthesized  design  are  less  than  200.  But  rtl  partitioning  and  package  design  does  not  handle  thermal 
(switching  activity)  constraints  on  the  design  and  cannot  be  used  for  designs  where  thermal  considerations 
are  important.  When  partitioning  at  the  RTL  netlist  level,  the  design  architecture  is  frozen  (during  high  level 
synthesis).  Alternate  multichip  designs  cannot  be  explored  during  hierarchical  RTL  partitioning,  whereas 
multicomponent  synthesis  explores  the  design  space  by  considering  alternate  implementations  during  high 
level  multicomponent  synthesis.  Also,  thermal  profiling  of  RTL  designs  is  too  time  consuming  (Section  3 
and  is  not  viable  for  large  designs.  Multicomponent  synthesis  with  hierarchical  package  design  yields 
better  results  for  the  larger  examples  and  also  considers  switching  activity  constraints  on  the  design.  Also, 
execution  times  for  multicomponent  synthesis  are  much  lower  than  execution  times  for  RTL  partitioning  for 
almost  all  our  examples.  Thus,  multicomponent  synthesis  with  hierarchical  package  design  is  the  preferred 
approach  for  large  designs  and  high  performance  packaging  technology. 
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Abstract 

Tradeoff  analysis  is  a  central  aspect  of  any  design  process.  Languages  and  tools  to  support 
performance  modeling  and  evaluation  are  necessary  to  facilitate  rapid  prototyping  of  designs. 
A  performance  modeling  and  tradeoff  analysis  environment  reduces  the  overall  design  time 
of  both  the  prototype  and  the  final  product,  by  helping  designers  in  determining  which  pa¬ 
rameters  of  a  design  are  critical  for  meeting  a  set  of  desired  performance  goals.  This  paper 
describes  a  case  study  in  performance  modeling  using  a  language  called  PDL  (Performance 
Modeling  Language).  The  PDL  system  supports  tradeoff  analysis  and  performance  visualiza¬ 
tion.  This  paper  also  addresses  some  of  the  key  issues  for  successful  tradeoff  analysis  during 
rapid  prototyping  and  explain  how  many  features  of  PDL  make  it  a  suitable  choice  for  this 
purpose. 


1  Introduction 

During  any  design  process,  many  decisions  are  made  which  affect  the  overall  performance  of 
the  design.  Many  such  decisions  result  from  detailed  tradeoff  analysis  among  several  related 
attributes  of  the  design.  For  example,  choice  of  the  input  clock  frequency  depends  partly  upon 
the  desired  upper  bound  on  power  consumption  and  the  desired  lower  bound  on  the  throughput 
rate.  Decisions  such  as  these  are  made  at  various  levels  of  the  design  process  from  specification 
to  implementation.  In  order  to  make  effective  design  choices,  the  design  environment  and 
supporting  tools  must  be  well  suited  for  performing  tradeoff  analysis  throughout  the  design 
process,  at  multiple  levels  of  abstraction. 

1This  work  was  partially  supported  by  the  ARPA  RASSP  program  and  monitored  by  the  Wright  Lab,  US-AF 
under  contract  number  F33615-93-C-1316  and  by  the  Semiconductor  Research  Corporation  under  contract  number 
DJ-293. 
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Performance  modeling  and  tradeoff  analysis  involves  developing  a  model  of  the  design  at  some 
level  of  abstraction,  behavioral,  macro-level,  register-transfer  level  etc.  During  model  eval¬ 
uation,  various  parameters  of  the  model  are  altered  and  their  effect  on  other  parameters  is 
observed.  Based  on  this  data  further  design  choices  are  made.  The  model  is  modified  accord¬ 
ingly,  and  the  process  of  performance  evaluation  and  tradeoff  analysis  repeated.  This  process 
continues  until  an  acceptable  design  is  reached  which  meets  all  the  required  performance  goals. 

For  effective  tradeoff  analysis  to  occur  dining  rapid  prototyping,  the  design  environment  should 
have  the  following  features:  (1)  Modeling  at  any  level  of  abstraction  must  be  supported  since 
tradeoff  analysis  occurs  at  various  levels  during  the  design  process;  (2)  The  modeling  environ¬ 
ment  must  lend  itself  to  reusability.  Reuse  of  models  is  very  critical  because  it  reduces  the 
time  spent  in  writing  a  model  for  each  new  version  of  the  design.  (3)  The  performance  evalu¬ 
ation  engine  should  be  flexible  enough  to  partially  evaluate  a  model  when  variations  in  some 
parameters  are  unknown.  This  facilitates  incremental  analysis  of  the  model;  (4)  The  modeling 
environment  must  be  easy  to  use  so  that  the  development  and  analysis  of  performance  models 
can  be  done  quickly  and  efficiently. 

We  have  developed  a  modeling  and  tradeoff  analysis  environment,  the  PDL  System,  which 
meets  all  these  criterion  and  is  well  suited  for  use  during  rapid  prototyping  [1].  A  PDL 
program  declares  design  objects  (various  kinds  of  modules,  nets,  and  ports)  that  can  appear 
in  a  design.  In  addition,  the  containment  and  connectivity  relationships  among  the  various 
kinds  of  objects  can  also  be  declared  in  the  PDL  program.  When  a  specific  design  in  the  form 
of  a  PDL  net-list  file  is  compiled  with  a  PDL  program,  the  compiler  will  be  able  to  determine 
if  the  net-list  contains  objects  of  the  kind  declared  in  the  PDL  program  and  whether  the  net- 
list  structure  conforms  to  the  object  composition  relationships  (containment  and  connectivity) 
which  were  declared  in  the  PDL  program. 

In  addition,  a  PDL  program  declares  various  types  of  attributes  and  attaches  them  to  the 
design  objects.  Attribute  evaluation  rules  can  also  be  specified  and  attached  to  the  design 
objects.  A  PDL  program  does  not  specify  any  order  among  these  rules;  they  are  viewed  as 
mathematical  equations.  Given  a  design  net-list  that  conforms  to  the  PDL  program,  the  PDL 
compiler  generates  a  global  attribute  dependency  graph  and  automatically  infers  a  complete 
evaluation  order  among  the  various  attribute  evaluation  rules.  An  executable  performance 
model  containing  the  proper  evaluation  sequence  for  all  the  evaluation  rules  is  generated. 

Figure  1  illustrates  the  PDL  System  and  design  process  for  generating  and  evaluating  perfor¬ 
mance  models.  Once  a  model  has  been  compiled,  the  evaluator  can  be  configured  to  evaluate 
and  collect  data  in  several  different  ways.  In  the  simplest  case,  a  model  can  be  completely  eval¬ 
uated  with  a  complete  set  of  input  data.  A  configuration  can  be  specified  that  allows  for  the 
collection  of  data  for  graphical  analysis  or  tabulation.  This  includes  allowing  the  specification 
of  ranges  of  values  for  particular  input  attributes.  Incremental  evaluation  is  also  possible  with 
a  performance  model  (the  feedback  loop  in  the  figure).  During  evaluation,  only  some  of  the 
input  data  is  supplied  with  the  result  being  another  performance  model.  Further  evaluation 
on  this  model  can  be  done  with  more  input  data  specified  as  necessary. 

This  paper  outlines  the  features  of  the  PDL  System  and  through  a  hardware/software  co¬ 
design  example  illustrates  how  the  PDL  system  can  be  used  for  effective  tradeoff  analysis 
during  rapid  prototyping.  The  rest  of  this  paper  is  organized  as  follows:  Section  2  introduces 
the  hardware/software  co-design  example  to  be  used  as  the  case  study  in  this  paper.  Section 
3  develops  the  PDL  performance  model  of  the  co-design  example  and,  through  this  example, 
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Figure  1:  Overview  of  the  PDL  System 

introduces  the  PDL  language  itself.  Section  4  describes  the  performance  evaluation  and  trade¬ 
off  analysis  process  using  the  PDL  system.  Section  5  discusses  the  results  of  this  analysis  for  a 
specific  co-design  example:  a  JPEG-like  image  compression  scheme.  Section  6  contains  some 
concluding  remarks. 


2  Hardware/ Software  Co-design  with  Coprocessors 

Rapid  prototyping  for  hardware-software  co-design  of  embedded  processor  and  coprocessor 
system  is  current  research  area.  The  specification  for  co-design  is  usually  represented  as  a 
task  graph  where  nodes  represent  tasks  and  edges  represent  communication  channels  [2].  For 
hardware-software  co-design,  the  goal  is  to  determine  which  tasks  should  be  implemented  in 
hardware  or  software  based  upon  some  performance  criterion.  When  the  target  architecture 
is  an  embedded  system,  several  hardware  tasks  can  be  implemented  as  ASICs  and  all  of  the 
software  tasks  are  allocated  to  execute  on  an  embedded  processor.  In  a  coprocessor  system, 
only  a  single  task  can  be  allocated  to  hardware  and  all  other  tasks  are  allocated  to  software. 
Again  all  software  tasks  usually  run  on  the  same  processor.  A  coprocessor  is  a  configurable 
plug-in  board  connected  to  a  main  processor  such  as  a  workstation  or  a  personal  computer. 
Components  of  the  coprocessor  board  include  a  programmable  chip,  interface  memory,  and  a 
predefined  interface  protocol  to  the  main  computer  [3]. 

While  developing  a  performance  model  for  co-design,  several  factors  shall  be  considered.  From 
a  software  perspective,  tasks  have  certain  properties  that  govern  their  execution  sequence.  If 
all  tasks  are  bound  to  execute  in  software,  then  only  one  task  can  execute  at  a  time.  The 
next  scheduled  task  can  not  begin  execution  until  all  preceding  tasks  are  finished.  Figure  2 
is  an  example  of  a  task  graph.  Although  there  are  tasks  which  appear  to  be  independent  of 
each  other,  an  execution  order  is  associated  with  this  task  graph  since  one  task  can  execute 
at  a  time.  For  this  example,  task  6  can  not  begin  executing  until  tasks  3,4,  and  5  finish.  This 
execution  order  is  referred  to  as  the  the  task  schedule  for  a  particular  task  graph. 

Another  feature  of  hardware  software  co-designs  is  the  inherent  parallelism  available  between 
the  hardware  and  software.  This  is  achieved  by  having  one  hardware  task  executing  in  the 
coprocessor  simultaneous  with  a  software  task  executing  in  the  main  processor.  Figure  3  shows 
a  simple  task  graph  were  hardware  software  parallelism  can  be  exploited.  When  none  of  the 
tasks  are  bound  to  hardware,  the  only  task  schedule  is  task  1  followed  by  task  2  and  so  forth. 

>  If  task  2  where  bound  to  hardware,  then  the  task  schedule  could  be  pipelined  so  that  task  2 

and  3  operate  simultaneously.  Pipelining  can  occur  with  a  data  buffer  between  task  2  and  3. 
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Figure  2:  Example  Task  Graph 

After  task  2  finishes  executing  the  first  time,  the  data  would  move  to  the  buffer  at  the  input  of 
task  3.  Then,  the  next  time  task  2  executed,  task  3  would  also  execute.  The  buffer  is  necessary 
to  ensure  task  2  can  not  write  to  the  same  memory  being  read  by  task  3  during  execution. 
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Figure  3:  Exploiting  Parallelism  in  Task  Graph 

If  the  target  architecture  is  a  coprocessor  board,  another  modeling  parameter  is  the  com¬ 
munication  time  between  tasks  when  one  task  is  in  software  and  the  other  is  in  hardware. 
The  computer  can  not  transmit  data  directly  to  the  coprocessor.  Instead,  data  is  transmit¬ 
ted  through  memory  located  on  the  coprocessor  board.  Because  this  memory  is  located  on  a 
board  which  is  connected  to  a  slower  bus  interface,  communication  time  necessary  to  read  and 
write  data  from  the  coprocessor  to  the  computer’s  main  memory  is  slower  than  usual  memory 
transfers  within  the  computer.  This  will  have  a  noticeable  impact  on  estimating  execution 
time  of  a  particular  set  of  bindings.  In  most  co-design  problems,  the  communication  between 
the  hardware  is  such  that  one  task  writes  to  the  coprocessor  memory  prior  to  execution  of  the 
hardware  task.  Once  the  hardware  task  finishes,  the  next  software  task  reads  the  results  from 
the  coprocessor  memory. 

The  expression  for  calculating  the  execution  time  of  a  task  graph  is  based  on  a  sum  of  the 
execution  time  for  each  task.  However,  with  the  hardware  parallelism  that  can  occur  due 
to  pipelining,  it  is  not  a  simple  summation  of  execution  times.  In  addition,  there  is  a  task 
schedule  that  has  to  be  specified  to  start  and  finish  the  pipeline.  All  of  these  factors  must  be 
expressed  by  the  equations  for  calculating  total  execution  time. 
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Calculating  the  execution  time  for  an  individual  task  is  given  by  the  equation: 


ExecutionTime  =  BindingTime  +  ^  RdOverHd  +  ^  WrOverHd  (1) 

RdOverHd  =  NumVariaMes  *  ReadTime  (2) 

WrOverHd  =  NumVariaMes  *  WriteTime  (3) 

In  this  equation,  BindingTime  is  the  execution  time  for  a  particular  task  depending  upon  a 
hardware  or  software  binding.  As  previously  mentioned,  tasks  which  must  read  or  write  to  the 
coprocessor  memory  have  an  associated  communication  time  related  to  transferring  the  data. 
Recall  that  a  single  task  can  have  several  edges  which  are  input  to  the  task.  For  each  task  on 
the  input  which  transmits  data  via  coprocessor  memory,  RdOverHd  will  be  a  non-zero  value. 
If  coprocessor  memory  is  not  involved,  then  RdOverHd  will  be  zero  for  that  particular  input 
edge.  Thus,  the  execution  time  includes  adding  all  the  time  necessary  for  reading  data  from 
the  coprocessor  memory.  A  similar  addition  is  used  for  writing  to  coprocessor  memory  for  all 
the  outputs  and  is  accounted  for  by  WrOverHd, 

Calculating  total  execution  time  is  given  by  the  equation: 

GlobalTime  —  ^  max(taskl' ExecutionTime,  task‘d! ExecutionTime, ...)  (4) 

taskl,task2 

GlobalTime  is  a  sum  of  the  execution  times  for  each  task.  A  particular  task  is  scheduled  and 
the  execution  time  is  ExecutionTime.  Another  task  is  scheduled  and  its  execution  time  is 
added  to  the  previous  time.  This  process  continues  until  all  tasks  have  been  scheduled  with 
GlobalTime  accumulating  the  execution  times  for  each  task.  Because  pipelining  allows  more 
than  one  task  to  execute  at  a  time,  the  total  global  time  only  increases  by  the  maximum 
ExecutionTime  of  all  tasks  which  are  scheduled  to  execute  simultaneously. 


3  Performance  Model  for  Co-designs 

In  this  section,  we  develop  a  suitable  performance  model  in  PDL  for  co-design  performance 
estimation.  PDL  has  three  basic  object  types  for  representing  designs:  modules,  carriers,  and 
ports.  A  module  can  be  used  to  represent  any  type  of  component  typically  found  in  a  design. 
A  carrier  is  commonly  used  for  representing  transport  components  such  as  connections,  wires, 
buses  and  communication  channels.  Ports  are  objects  used  primarily  for  representing  the 
connectivity  among  various  design  components.  [4] 

In  the  co-design  example,  a  task  graph  represents  the  overall  design  and  nodes  in  the  task 
graph  represent  tasks.  Connections  between  tasks  are  considered  directed  edges  with  no  two 
tasks  having  more  than  one  directed  edge  between  them.  To  represent  a  task  graph  in  a  PDL 
model,  the  first  step  is  to  define  the  various  task  graph  components  with  PDL  objects.  Figure 

4  shows  the  PDL  definitions  for  two  ports  and  a  carrier  which  collectively  represent  edges  in 
a  task  graph.  Two  ports  are  defined  such  that  there  is  a  unique  input  and  output  port  which 
represents  a  directed  edge. 
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port  task-out-port 
end  port; 

port  taskin.port 
end  port; 


carrier  edge 
ports 

input :  task_out.port; 
output  :  taskin_port; 
end  carrier; 


Figure  4:  PDL  declaration  for  representing  edge  and  related  ports 


In  the  carrier  declaration  edge  there  is  a  ports  section.  In  PDL,  various  objects  can  contain 
references  to  other  objects;  this  is  known  as  containment  A  carrier  object  may  only  contain 
ports.  However,  a  module  object  may  contain  references  to  other  modules,  carriers,  and  ports. 
Containment  serves  two  useful  purposes.  First,  it  allows  the  parent  object,  in  this  case  the 
carrier,  access  to  information  within  any  contained  object.  Secondly,  if  two  different  PDL 
objects,  perhaps  two  modules,  contain  a  reference  to  the  same  port,  then  the  two  objects  are 
considered  connected  to  each  other  through  that  port.  Figure  5  shows  the  PDL  definition  for 
a  task  module  which  represents  tasks  in  the  task  graph  and  the  codesign  module  which 
represents  the  entire  task  graph. 


module  codesign 
ports 

inputs  {}  :  task_out_port; 
outputs  {}  :  taskin_port; 
carriers 

connections  {}  :  edge; 
modules 
tasks{}  :  task; 

end  module; 


module  task 
ports 

inputs  {}  :  taskin_port; 
outputs{}  :  taskjout_port; 
end  module; 


Figure  5:  PDL  declaration  for  representing  task  and  codesign 

In  the  task  module  there  are  containment  declarations  for  inputs  and  outputs.  Within  a 
task  graph,  a  task  may  have  any  number  of  other  task  edges  as  input.  Additionally,  a  task 
can  also  have  output  edges  that  branch  to  other  tasks.  In  the  declaration,  the  {}  notation  is 
used  to  denote  a  set  of  objects,  with  a  set  containing  zero  or  more  objects.  Thus,  for  the  task 
there  will  be  a  set  of  input  and  output  edges.  In  a  containment  declaration,  when  the  {}  is 
not  used,  this  means  the  reference  is  to  a  single  object. 

Module  codesign  is  used  to  represent  the  entire  task  graph.  It  contains  a  definition  for  a 
set  inputs.  These  are  all  the  inputs  to  the  task  graph  (there  may  be  more  than  one  but  is 
usually  the  root  of  the  task  graph).  Another  definition  declares  a  set  of  outputs.  These  are 
all  the  outputs  from  the  task  graph  which  are  usually  the  final  tasks  in  the  task  graph  to 
execute.  In  addition,  there  are  definitions  for  connections  which  are  all  the  edges  in  the  task 
graph  and  tasks  which  are  all  the  tasks  in  the  graph.  The  code  sign  module  represent  all  the 
containment  relationships  existing  in  a  graph. 

Once  all  the  objects  representing  components  in  the  task  graph  have  been  specified,  the  next 
step  for  developing  a  model  is  to  introduce  attributes  and  attribute  evaluation  rules  in  the 
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objects.  Attributes  are  parameters  that  are  propagated  and  computed  in  the  PDL  model.  An 
evaluation  rule  describes  how  to  perform  the  calculation  of  an  attribute  in  an  object.  Figure  6 
shows  the  declarations  of  the  port  and  carrier  objects  with  all  their  attributes  and  evaluation 
rules. 


type 

hw«swJ>ind  :  enum  {hw,sw}; 
end  type; 

port  task.out.port 
attributes 

primitive  num.var  :  int; 
tl  J>ind  :  hw_swjbind; 
t2,bind  :  hw.sw.bind; 
wr_overhd  :  real; 
rd-overhd  :  real; 
dynamic  done,  t2_job  :  int  :=  0; 
rules 

wr_overhd  = 

wr.comm(tlJbmd,  t2.bind,  num.var); 
rdjoverhd  = 

rd.comm(tl.bind,  t2.bind,  num.var); 

end  port; 


port  tasLin.port 
attributes 
t2J)ind  :  hw.sw_bmd; 
rd_overhd  :  real; 
dynamic  done  :  int  :=  0; 
dynamic  t2_job  :  int  :=  0; 
end  port; 

carrier  edge 
ports 

input :  task_out_port; 
output  :  taskin.port; 
rules 

input’t2.bind  =  output’  t2.bind; 
input,t2_job  =  output’t2.job; 
output’rdjoverhd  =  input ’rd_overhd; 
output’done  =  input ’done; 
end  carrier; 


Figure  6:  Attributes  for  edge  carrier  and  related  ports 

Attributes  are  defined  in  the  attributes  section  of  an  object.  An  attribute  can  be  any  allowable 
data  type.  Some  of  the  types  available  are  integer,  real,  enumerated  type,  heterogeneous 
records,  lists,  and  a  variety  of  combinations  of  these.  An  attribute  is  associated  with  the 
object  where  it  was  declared  and  not  with  the  object  where  the  attribute  is  given  a  value  or 
referenced.  Thus,  when  an  attribute  is  used  in  an  expression  where  it  is  not  defined  within 
the  object,  it  is  referenced  as  object’ attribute.  For  example,  in  the  edge  carrier  there  is  a 
reference  to  input ’t2Jbind.  The  port  input  is  declared  as  a  contained  port  and  within  the 
port  there  is  an  attribute  declaration  for  t2_bind. 

Along  with  defining  the  type,  the  attributes  section  is  used  to  define  an  attribute  as  primitive 
or  non-primitive.  An  attribute  is  non-primitive  unless  explicitly  declared  as  a  primitive.  A 
primitive  attribute  is  an  attribute  which  will  not  have  an  evaluation  rule  for  defining  how 
to  calculate  it.  Instead,  a  primitive  attribute  will  have  its  value  set  by  the  user  during  the 
execution  of  the  performance  model.  For  example,  in  the  task_out_port  port  the  primitive 
attribute  num_var  is  the  number  of  variables  being  transmitted  from  one  task  to  another.  This 
value  can  not  be  calculated  because  it  depends  on  the  actual  task  graph  being  modeled  and  is 
not  based  on  any  information  within  the  model.  Thus,  when  the  model  is  executed  the  user 
will  supply  the  number  of  variables  being  transmitted. 

In  addition  to  being  primitive,  an  attribute  can  also  be  dynamic  or  static.  An  attribute  is 
considered  static  unless  declared  dynamic.  During  model  execution,  all  static  attributes  are 
calculated  once.  These  are  attributes  which  are  not  based  upon  some  dynamic  stream  of 
events,  but  instead  are  values  which  need  to  be  calculated  once  since  they  are  independent  of 
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other  events  occuring  within  the  model.  Conversely,  dynamic  attributes  are  not  single  values 
but  streams  of  single  values.  As  a  model  executes,  it  may  be  re-evaluated  any  number  of  times. 
During  each  re-evaluation  cycle,  there  is  a  corresponding  value  for  the  dynamic  attribute.  For 
example,  if  there  are  5  evaluation  cycles,  then  every  dynamic  attribute  will  have  5  distinct 
values.  This  is  similar  to  simulating  the  performance  model  based  on  a  stream  of  events  which 
occur. 

Along  with  defining  attributes  and  types,  evaluation  rules  also  need  to  be  defined  for  various 
attributes.  It  is  not  necessary  that  an  attribute  has  an  evaluation  rule  in  the  same  object  where 
it  was  declared.  For  instance,  in  the  taskJLn-port,  all  the  attributes  are  given  values  by  the 
edge  carrier  object.  Thus,  the  taskJLn»port  port  has  no  evaluation  rule  for  these  attributes. 
This  is  how  information  is  transmitted  among  various  objects  in  a  PDL  model.  An  attribute  is 
declared  for  some  object,  but  another  object  has  an  evaluation  rule  for  the  attribute.  Another 
object  can  reference  the  value  of  the  attribute  after  it  has  been  evaluated.  For  example,  in  the 
edge  carrier,  the  done  attribute  of  output  is  given  an  evaluation  rule  where  it  is  the  same  as 
input 5  done.  Any  other  object  which  would  contain  the  same  input  port  could  then  read  the 
value  of  done. 

Figure  7  shows  the  task  module  with  all  its  corresponding  attributes  and  evaluation  rules. 
There  are  several  evaluation  rules  which  transfer  information  between  the  input  and  output 
of  the  task.  These  attributes  are  used  for  defining  when  a  task  has  been  scheduled  and  to 
determine  the  bindings  of  connected  tasks.  Recall  that  if  a  task  is  in  hardware  there  is  a 
communication  overhead  which  must  be  calculated.  Attribute  comm_overhd  will  be  0.0  if 
the  task  is  not  bound  to  hardware  otherwise  it  will  be  a  total  of  rd.overhd  and  wr.overhd 
times  which  were  calculated  in  the  port  using  equations  2  and  3.  The  exec.time  attribute  is 
either  the  hardware  or  software  time  for  the  task,  and  the  time  attribute  is  the  sum  of  the 
execution  time  and  communication  time.  This  is  the  total  execution  time  for  the  task  when 
it  is  scheduled.  Because  every  dynamic  attribute  is  re-evaluated  on  each  evaluation  cycle,  if  a 
task  has  not  been  scheduled  it  does  not  add  to  the  total  time  during  that  specific  evaluation 
cycle.  A  new  task  is  scheduled  each  evaluation  cycle. 

The  last  definition  is  the  codesign  module.  Figure  8  shows  the  definition  with  all  the  at¬ 
tributes  and  evaluation  rules.  In  the  rules  section,  there  are  several  evaluation  rules  which  set 
attributes  in  the  contained  object  tasks.  The  {}  indicates  that  attribute  nunnjobs  in  task 
is  to  be  set  to  the  primitive  value  num_jobs.  The  evaluation  rule  for  global.time  is  similar 
to  equation  4.  There  is  an  evaluation  rule  which  sets  the  global.time  in  each  task  to  the 
current  global.time.  Thus,  after  each  task  is  scheduled  and  calculated,  a  new  global.time 
is  determined  by  taking  the  maximum  of  all  the  times  from  each  task. 

All  of  these  evaluation  rules  have  been  defined  in  the  PDL  model  in  no  particular  order. 
However,  there  is  an  inherent  order  associated  that  is  implied  by  these  rules.  If  a  rule  depends 
upon  the  value  of  another  rule,  then  that  rule  can  not  be  evaluated  until  the  other  rule  is 
done  first.  These  evaluation  rule  dependencies  produce  a  global  attribute  dependency  graph. 
Figure  9  illustrates  just  some  of  the  dependencies  among  the  various  attributes  in  the  task 
module.  Primitive  attributes  do  not  depend  upon  other  attributes  and  are  the  leaf  nodes  in 
the  dependency  graph  (those  attributes  in  the  figure  with  boxes  around  them).  A  directed 
dependency  graph  may  not  contain  any  cycles  among  the  attributes. 
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module  task 
ports 

inputs  {}  :  task-in_port; 
outputs{}  :  task-out-port; 
attributes 

primitive  binding  :  hw_swj)ind; 
primitive  hwtime  :  real; 
primitive  swtime  :  real; 
primitive  schcLno  :  int; 
primitive  dynamic  schd-task  :  int  :=  0; 
dynamic  time  :  real  :=  0.0; 
dynamic  job  :  int  :=  0; 
dynamic  job.diff  :  int  :=  0; 
dynamic  donein  :  int  :=  0; 
dynamic  donejout  :  int  :=  0; 
dynamic  exec:int  :=  0; 
dynamic  global-time  :  real  :=  0.0; 
numjobs  :  int; 
rd-overhd  :  real; 
wrjoverhd  :  real; 
commjoverhd  :  real; 
exec-time  :  real; 
rules 

inputs{}’t2Jbind  —  binding; 
inputs  {}’t2  job  =  curr  job; 
outputs{}’tlJbind  =  binding; 
exec-time  =  if  (binding  ==  hw) 
then  hwtime 
else  swtime 
endif; 
done jn  = 

eval(foreach  p  in  inputs{  curr  p’done  }); 
job  jdiff  =  curr  job  - 
min(foreach  p  in  outputs{  pJt2  job  }); 
rd_overhd  — 

sum(foreach  p  in  inputs{  pJrd_overhd  }); 
wr-overhd  = 

sum(foreach  p  in  outputs{  p’wrjDverhd  }); 


exec  = 
begin 
temp:int; 

if  ((donedn  ==  1)  and  (curr  job  <  num  jobs) 
and  (job-difF  <  1)) 
then  if  (binding  ==  hw) 
then  temp:=l; 

else  if  (schd-no  ==  schd-task) 
then  temp:=l; 
else  temp:=0; 
endif; 
endif; 

else  temp:=0; 
endif; 

return  temp; 
end; 

comm-overhd  =  if  (binding  ==  hw) 
then  0.0 

else  wrjoverhd  +  rdjoverhd 
endif; 

time  =  if  (exec  ==  1) 

then  global-time  -f  exec.time  +  comm-overhd 

else  time 

endif; 

job  =  if  (exec  ==  1) 
then  job  +  1 
else  job 
endif; 

done-out  =  if  (exec  ==  1) 
then  1 
else  done-out 
endif; 

outputs  {}’done  =  donejout; 
end  module: 


Figure  7:  Attributes  for  the  task  module 
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module  codesign 
ports 

inputs  {}  :  taskjout-port; 
outputs  {}  :  taskin.port; 
carriers 

connections  {}  :  edge; 
modules 
tasks{}  :  task; 
attributes 

primitive  numjobs  :  int; 

primitive  dynamic  schcLtask  :  int  :=  0; 

dynamic  globaLtime  :  real  :=  0.0; 


rules 

tasks{},n\imjobs  =  num-jobs; 
tasks{}’schd.task  =  schcLtask; 
globaLtime  = 

max(foreach  t  in  tasks{t’time}); 
tasks{}’globalJtime  =  curr  global_time; 
end  module; 


Figure  8:  Attributes  for  the  codesign  module 


Figure  9:  Example  Dependency  Graph 

4  Tradeoff  Analysis  Using  the  PDL  System 

Once  a  PDL  model  is  written,  the  next  step  is  to  compile  it.  As  mentioned  previously,  a  PDL 
model  by  itself  is  not  an  executable  model.  An  executable  model  is  only  generated  when  the 
PDL  model  is  coupled  with  a  specific  design.  This  is  the  role  of  the  compiler.  It  takes  a  PDL 
model  and  a  design  (a  specific  task  graph  in  the  case  of  our  example)  as  input  and  generates 
an  executable  performance  model.  During  the  performance  model  generation,  the  order  for 
evaluating  all  the  various  attributes  is  determined.  The  result  of  compilation  is  an  executable 
performance  model  has  a  correct  evaluation  order  for  all  expressions.  Figure  10  is  a  detailed 
overview  of  the  PDL  system  and  the  flow  of  a  PDL  program  and  net-list  through  an  analysis 
cycle. 

Once  a  performance  model  has  been  generated,  the  user  executes  the  model  with  the  PDL 
evaluator.  Model  evaluation  can  be  done  in  several  ways  depending  upon  the  configuration 
and  input  to  the  evaluator  tool.  A  performance  model  can  be  completely  evaluated  when  all 
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Figure  10:  PDL  System  Overview 

the  primitive  attribute  values  are  supplied;  this  is  known  as  full  evaluation.  Another  option 
is  to  evaluate  the  performance  model  with  only  some  of  the  primitive  data  specified.  This  is 
known  as  partial  evaluation.  In  addition,  the  evaluator  can  be  configured  to  collect  data  for 
analysis  based  on  ranges  of  values  for  primitive  attributes  instead  of  single  values.  Finally,  the 
evaluator  can  be  linked  into  an  existing  CAD/CAE  tool  to  perform  data  analysis. 

Full  evaluation  of  a  model  begins  with  a  performance  model.  All  primitive  attributes  that 
were  defined  in  the  PDL  model  must  be  defined  by  the  user  in  an  input  data  file.  When  the 
evaluator  is  invoked,  both  the  performance  model  and  data  file  are  read,  and  all  expressions 
are  evaluated  with  the  results  written  to  another  performance  model.  Since  the  model  was 
fully  evaluated,  all  evaluation  rules  will  have  been  replaced  with  their  corresponding  evaluation 
result.  Thus,  the  resulting  performance  model  will  contain  nothing  but  attributes  and  their 
evaluated  values. 

In  addition  to  full  evaluation,  the  user  can  partially  or  incrementally  evaluate  a  model.  Instead 
of  specifying  a  complete  set  of  values  for  all  primitive  attributes,  the  user  can  specify  only  some 
of  the  data  for  the  primitive  attributes.  When  the  evaluator  is  invoked,  the  performance  model 
and  data  file  are  read,  all  evaluation  rules  are  partially  evaluated  and  a  residual  performance 
model  is  generated.  The  residual  model  model  will  still  contain  (partially  evaluated)  evaluation 
rules  for  various  attributes  which  have  been  reduced  and  simplified  with  respect  to  the  original 
expression.  The  residual  model  can  be  further  evaluated  when  more  primitive  attribute  data 
is  available. 

There  are  several  cases  where  partial  evaluation  can  be  useful  during  tradeoff  analysis.  During 
analysis,  there  may  be  some  primitive  attributes  of  interest  that  need  analysis  as  to  their  effect 
on  the  designs  performance.  Evaluating  a  large  model  several  times  with  primitive  attributes 
which  do  not  change  between  successive  evaluations  can  be  costly.  The  solution  is  to  partially 
evaluate  the  performance  model  with  only  those  primitive  attributes  which  do  not  change.  All 
evaluation  rules  axe  evaluated  and  whenever  possible  reduced  to  depend  only  on  those  primitive 
attributes  which  have  not  been  specified.  This  results  in  a  simpler  performance  model  that 
'i  can  then  be  used  in  successive  evaluations  with  the  remaining  primitive  attributes  specified. 
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This  results  in  the  elimination  of  redundant  evaluation  of  unchanging  evaluation  rules  which 
helps  to  improve  data  collection. 

In  addition  to  evaluating  a  model  with  single  data  points,  the  evaluator  can  be  configured  to 
collect  data  for  ranges  of  primitive  values.  For  example,  instead  of  setting  a  primitive  attribute 
to  one  particular  value,  the  user  can  specify  that  a  primitive  attribute  cam  be  a  range  of  values. 
Then  during  execution,  the  model  is  evaluated  with  the  specified  attribute  set  to  each  value 
in  the  range.  Any  number  of  primitive  attributes  can  be  setup  to  have  ranges  of  values.  In 
addition,  the  evaluator  can  be  configured  to  collect  data  on  any  attribute  attribute  within 
the  model,  primitive  or  non-primitive.  Two  types  of  data  collection  is  possible:  Data  can  be 
collected  in  a  format  suitable  for  two  or  three  dimensional  plots,  or  the  evaluator  can  collect 
data  for  any  number  of  attributes  and  store  the  results  in  tabular  form. 

In  the  case  where  a  designer  may  need  to  collect  data  in  a  particular  fashion  not  handled  by 
the  evaluator,  the  evaluator  exists  as  a  C  run-time  library.  Contained  in  the  library  are  several 
functions  which  together  constitute  a  procedural  interface  to  the  evaluator.  The  user  can  use 
these  functions  to  setup  a  performance  model  and  collect  data  in  a  form  suitable  for  their 
own  needs.  Thus,  the  library  can  be  used  to  read  a  performance  model,  set  values  for  various 
attributes,  evaluate  the  model,  restore  the  performance  model  to  a  previous  state,  and  many 
other  activities. 


5  Tradeoff  Analysis  for  a  Co  design  Example 

The  co-design  model  written  in  PDL  and  discussed  in  Section  3  is  flexible  enough  to  perform 
performance  modeling  for  many  different  types  of  task  graphs.  In  addition,  any  number  of  tasks 
can  be  bound  to  hardware  or  software.  In  this  section,  we  discuss  results  of  using  this  model  for 
a  specific  codesign  example  involving  a  JPEG-like  compression/decompression  scheme  [5,  6]. 
The  target  architecture  was  a  coprocessor  system.  Tradeoff  analysis  was  performed  with  the 
PDL  model  to  determine  which  task  to  implement  in  hardware.  Figure  11  shows  the  task 
graph  for  the  compression  part  of  the  JPEG  algorithm  in  terms  of  objects  in  the  PDL  model. 
Arrows  in  the  figure  represent  the  connectivity  among  the  various  PDL  objects. 


Figure  11:  Task  Graph  for  JPEG 

First  step  in  performing  tradeoff  analysis  was  estimating  the  hardware  and  software  times  for 
each  task.  Obtaining  software  time  involved  using  existing  software  profiling  tools  to  time  each 
of  the  tasks  in  the  software  version  of  the  JPEG  algorithm.  In  this  case,  all  software  times 
were  collected  with  timing  functions  on  a  Pentium  system  containing  a  P100  microprocessor, 
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256  kilobytes  of  standard  cache,  and  16  megabytes  of  main  memory.  Estimating  the  hardware 
execution  times  was  accomplished  with  a  synthesis  tool  [7]  that  generated  a  register  transfer 
level  design  for  each  task.  In  addition  the  synthesis  tool  estimated  the  execution  times  for  each 
RTL  design.  Times  were  estimated  for  a  2  micron  CMOS  technology.  Table  1  shows  estimated 
hardware  and  measured  software  execution  times  for  the  various  JPEG  tasks.  These  times  are 
for  each  task  performing  its  respective  job  on  16  pixels  at  a  time. 


Task 

Hardware 

Software 

DCT 

8.4  n s 

371.3  ns 

Quantization 

0.6  /js 

7.56  (is 

ZigZag 

0.4  ns 

1.63  ^s 

RLE  and  Huffman  Encoding 

884  (Is 

18.48  ns 

Table  1:  Estimated  Task  Times 


There  are  several  aspects  of  any  task  which  affect  its  behavior  in  hardware  or  software.  Tasks 
which  are  very  mathematically  intensive  tend  to  have  better  performance  in  hardware  than 
software  because  of  hardware  optimizations  made  by  the  the  synthesis  tool.  However,  task 
which  contain  many  control  and  data  flow  statements  axe  better  suited  for  software  because 
the  synthesized  control  hardware  is  far  more  complex  them  its  software  counterpart.  Execution 
times  in  table  1  illustrate  these  facts.  The  DCT  (Discrete  Cosine  Transform)  is  almost  entirely 
mathematical  and  as  such  performs  better  in  hardware  than  software.  However,  Huffman 
encoding  is  a  control  oriented  algorithm  containing  very  few  arithmetic  operations. 

The  next  step  in  the  analysis  process  was  to  use  the  PDL  model  to  determine  execution  times 
for  the  task  graph  with  each  task  bound  to  hardware.  This  was  done  by  compiling  the  PDL 
program  with  the  design  for  the  JPEG  task  graph.  Four  data  files  were  created  as  input  to  the 
evaluator  with  each  file  binding  a  single  task  to  hardware.  The  model  was  setup  to  estimate 
execution  time  for  an  input  file  that  contained  4080  pixels.  In  addition,  all  task  schedules  were 
defined  for  pipelining  since  the  PDL  model  was  written  to  account  for  it.  Table  2  shows  the 
results  of  evaluating  the  model  with  these  four  data  files. 


Task  in  Hardware 

Execution  Time 

DCT 

0.234  s 

Quantization 

1.51  s 

ZigZag 

1.54  s 

RLE  and  Huffman  Encoding 

4.07  s 

Table  2:  PDL  Results  for  Task  Bindings 


Results  of  table  2  show  that  the  DCT  (Discrete  Cosine  Transform)  task  was  the  best  choice  for 
implementation  on  the  coprocessor  hardware.  We  did  implement  the  DCT  task  in  a  coprocessor 
system  [3]  connected  to  a  Pentium  based  PC.  Once  complete,  actual  execution  times  for 
compressing  images  of  different  sizes  were  measured.  Accordingly,  the  PDL  performance  model 
v  was  evaluated  with  primitive  attributes  set  for  each  of  the  different  input  images.  Table  3  shows 

the  results  of  the  PDL  estimations  compared  with  the  coprocessor  execution  times. 


13 


142 


File  Size 
(no.  of  pixels) 

PDL  Estimated  Time 
(seconds) 

Actual  Time 
(seconds) 

%  Error 

18,048 

2.23  s 

2.11  s 

5.7 

25,920 

3.09  s 

3.02  s 

2.3 

54,896 

6.13  s 

6.51  s 

5.8 

69,840 

7.43  s 

8.11  s 

8.3 

87,552 

9.16  s 

10.16  s 

9.8 

Table  3:  Comparison  of  Estimated  to  Actual  Execution  Times 


6  Conclusion 

We  have  presented  a  performance  modeling  and  analysis  approach  for  co-designs  using  the  PDL 
system.  In  PDL,  it  is  straight-forward  to  make  several  enhancements  to  the  codesign  model 
presented  in  this  paper  so  that  it  requires  less  primitive  input  information  or  considers  more 
performance  parameters  than  just  execution  time.  For  example,  the  model  could  determine  a 
task  schedule  based  on  the  hardware  software  bindings,  more  detail  could  be  included  as  to 
the  target  architecture,  estimation  could  be  incorporated  for  cost,  hair d ware  area,  and  so  forth. 
As  the  design  evolves  and  requires  more  detailed  performance  analysis,  so  too  can  the  PDL 
model  evolve  and  be  refined  to  a  more  accurate  representation  of  the  system  being  modeled. 

It  is  important  to  note  that  the  PDL  model  is  a  generic  model  from  which  specific,  executable 
performance  models  can  be  generated  (using  the  PDL  compiler)  given  a  specific  task  graph. 
Thus,  the  PDL  model  applies  to  any  task  graph  which  follows  the  object  construction  scheme 
specified  in  the  PDL  program.  This  is  the  essential  difference  between  performance  modeling 
in  PDL  versus  a  procedural  hardware  description  language  such  as  VHDL.  More  information 
on  the  PDL  language  and  system,  including  the  system  software,  can  be  obtained  through  the 
PDL  home  page  on  the  WWW  at  http://www.ece.uc.edu/”  ddel/pdl.html. 
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Performance  Verification  Using  Partial  Evaluation  and  Interval  Analysis 


Abstract 

Performance  models,  usually  written  in  high-level  programming  languages  or  high-level  hardware 
description  languages,  make  full  use  of  high  level  procedural  constructs  such  as  the  assignment 
statement,  if-then,  case,  while  control  constructs  and  procedure  calls.  We  propose  a  partial 
evaluation  procedure  to  reduce  procedural  performance  models  into  an  equational  form.  We 
then  propose  an  interval-analysis  based  method  to  formally  determine  whether  the  reduced  per¬ 
formance  model  satisfies  a  set  of  relational  constraints  on  the  performance  attributes.  Together, 
the  partial  evaluation  and  interval  analysis  procedures  constitute  a  powerful  approach  for  formal 
performance  verification.  We  illustrate  this  through  examples,  and  describe  both  techniques  in 
detail.  Also  included  are  results  for  an  implementation  of  a  symbolic  partial  evaluator  of  per¬ 
formance  models  and  a  performance  verification  tool  based  on  the  interval  analysis  technique. 
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1  Introduction 

System  designs  consist  of  a  hierarchical  collection  of  modules  with  ports  connected  by  nets. 
Performance  of  a  system  is  described  by  a  collection  of  attributes  attached  to  the  various  objects 
(modules,  ports  and  nets)  in  the  design.  A  performance  model  is  an  executable  specification 
where  some  of  these  attributes  are  specified  in  terms  of  the  other  (computed)  attributes.  Usually, 
performance  models  are  written  in  a  hardware  description  language  or  a  high-level  programming 
language  using  the  full  power  of  the  procedural  programming  constructs,  such  as  the  assignment 
statement,  conditional  and  iterative  statements  and  function  calls,  provided  in  these  languages. 
In  this  paper,  we  refer  to  these  models  as  procedural  performance  models . 

For  example,  Figure  1  shows  a  combinational  logic  design  and  Figure  2  shows  a  procedural  perfor¬ 
mance  model  for  computing  CMOS  dynamic  power  dissipation  based  on  input  signal  probabilities 
[1].  It  shows  the  primitive  and  the  computed  performance  attributes.  Attributes  are  attached  to 
each  entity  in  the  design  and  are  referenced  using  the  notation  ObjectName’AttributeName.  In 
this  example,  all  attributes  are  assumed  to  be  real  valued.  This  example  is  a  procedural  model 
due  to  the  presence  of  function  calls  which  in  turn  contain  variable  assignment  statements  inside 
while  loops. 
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Figure  1:  Example  Design  Net  List 


The  performance  verification  problem  is  to  determine  whether  a  performance  model  can  simulta¬ 
neously  satisfy  a  set;  of  relational  constraints  placed  on  the  performance  attributes.  It  is  known 
that  the  performance  verification  problem  is  undecidable  for  procedural  performance  models 
[2,  3].  In  this  paper,  we  show  how  eqauational  performance  models  can  be  verified  using  an 
interval  based  analysis  technique.  An  equational  performance  model  consists  of  equations,  one 
for  each  computed  attribute,  in  terms  of  other  attributes  using  a  predefined  set  of  mathematical 
operators.  An  equational  model  does  not  contain  any  programming  constructs  such  as  function 
calls,  conditional  and  iterative  statements,  and  so  forth.  Furthermore,  as  will  be  discussed  in 
Section  3,  mathematical  operators  in  the  equations  must  be  invertible  in  the  sense  that  for  each 
operation  an  inverse  operation  must  exist. 

Although  operators  used  in  basic  expressions  in  procedural  performance  models  are  usually  in¬ 
vertible,  there  are  many  constructs  that  are  not  invertible.  For  example,  variable  assignment 
is  non-invertible.  Control  constructs  such  as  case  and  while  statements  and  function  calls  are 
also  non-invertible  in  the  presence  of  the  assignment  statement.  In  some  special  cases  a  proce¬ 
dural  performance  model  fragment  may  be  invertible,  but  that  it  is  invertible  is  quite  hard  to 
determine,  requiring  detailed  mathematical  analysis. 

The  question  is  how  to  transform  a  procedural  performance  model  to  an  equational  model 
when  sufficient  primitive  attribute  data  is  available.  This  paper  also  addresses  this  question 
and  develops  a  partial  evaluation  [4,  5]  technique  to  reduce  procedural  performance  models 
to  the  equational  form.  Once  reduced,  these  models  can  be  subjected  to  formal  performance 
verification  using  the  interval  analysis  technique.  Figure  3  shows  the  process  of  performance 
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Primitive  Attributes 
ckt’systemJreq 
gl’capacitance 
g2 ’capacitance 
g3 ’capacitance 
g4’capacitance 


ckt’voltage 

nl’prob 

n2’prob 

n3’prob 

n4’prob 


Computed  Attributes 

n5’prob  =  calc_and_prob[(nrprob,  n2’prob]) 

n6’prob  =  calc-or_prob([n3’prob,  n4’prob]) 

n7’prob  =  calc_and_prob([n5’prob,  n6’prob]) 

n8’prob  =  calc-or_prob([n6’prob,  n7’prob]) 

gl’freq  =  min  ([n5 ’prob,  1.0  -  n5’prob])  *  ckt’systemJreq  *  2.0 

gl’power  =  (ckt’voltage**2  *  gl’capacitance  *  gl’freq)  /  2.0 

g2’£req  =  min([n6’prob,  1.0  -  n6’prob])  *  ckt’systemJfeq  *  2.0 

g2’power  =  (ckt 5  volt  age*  *2  *  g2 ’capacitance  *  g2’freq)  /  2.0 

g3’freq  =  min([n7’prob,  1.0  -  n7’prob])  *  ckt’systemJreq  *  2.0 

g3’power  =  (ckt’voltage**2  *  g3 ’capacitance  *  g3’freq)  /  2.0 

g4’freq  =  min  ([n8 ’prob,  1.0  -  n8’prob])  *  ckt’systemJreq  *  2.0 

g4’power  =  (ckt’voltage**2  *  g4’capacitance  *  g4’freq)  /  2.0 

ckt’power  =  gl’power  +  g2’power  4*  g3’power  4  g4’power 


function  min(vals[  ]) 
begin 

temp  :=  vals[l] 
foreach  v  in  vals 

{  if  (temp  >  v)  then 
temp  :=  v  } 
return  temp 
end 


function  calc-and_prob(vals[  ]) 
begin 

temp  :=  0 
foreach  prob  in  vals 

{  temp  :=  temp  *  prob  } 
return  temp 


function  calc_or_prob(vals  [  ]) 
begin 

temp  :=  0 
foreach  prob  in  vals 

{  temp  :=  temp  *  (1.0  -  prob)  } 
return  1.0  -  temp 
end 


Figure  2:  Procedural  Performance  Model  for  Dynamic  Power 
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Figure  3:  Performance  Evaluation  and  Verification 


Computed  Attributes 

n5’prob  =  0.25 

n6’prob  =  0.75 

n7’prob  =  0.1875 

n8’prob  =  0.796875 

gl’freq  =  0.5  *  ckt’systemJreq 

gl’power  =  (ckt’voltage**2  *  gl’capacitance  *  gl’freq)  /  2.0 
g2’freq  =  0.5  *  ckt’systemJreq 

g2 ’power  =  (ckt ’voltage* *2  *  g2’ capacitance  *  g2’£req)  /  2.0 
g3’freq  =  0.375  *  ckt’systemJreq 

g3 ’power  =  (ckt ’voltage* *2  *  g3 ’capacitance  *  g3’freq)  /  2.0 
g4’freq  =  0.40625  *  ckt’systemJreq 

g4’power  =  (ckt ’voltage* *2  *  g4’ capacitance  *  g4’freq)  /  2.0 
ckt’power  =  gl’power  +  g2’power  4-  g3’power  +  g4’power 


Figure  4:  Equational  Performance  Model  for  Dynamic  Power 
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verification  using  partial  evaluation  followed  "by  interval  analysis. 

For  example,  Figure  4  shows  an  equational  model  which  is  obtained  by  reducing  the  procedural 
model  shown  in  Figure  2  after  setting  the  all  input  signal  probabilities  to  0.5  (high  and  low 
signal  values  are  equally  likely)  and  partially  evaluating  the  model.  This  equational  model  can 
now  be  subjected  to  formal  performance  verification  based  on  the  interval  analysis  technique. 

For  example,  the  question  as  to  whether  10.0  <  gl' capacitance  <  25.0,  5.0  <  g2' capacitance  < 
10.0,  5.0  <  gZ' capacitance  <  20.0,  8.0  <  gA' capacitance  <  15.0,  1.0  <  ckt'voltage  <  5.0, 
10.0  <  ckt' system./ req  <  30.0  implies  0.0  <  mc'power  <  12000  can  be  answered  affirmatively, 
and  the  question  as  to  whether  10.0  <  gV  capacitance  <  25.0,  5.0  <  g2' capacitance  <  10.0, 
5.0  <  gZ' capacitance  <  20.0,  8.0  <  gA' capacitance  <  15.0,  3.3  <  ckt'voltage  <  3.5,  30.0  < 
ckt' system./ req  <  50.0  implies  0.0  <  mdpower  <  2000  can  be  answered  negatively  once  the 
model  is  reduced  to  the  equational  form.  Of  course,  the  verification  is  valid  only  within  the 
partial  data  with  which  the  model  was  partially  evaluated.  This  approach  is  analogous  to  the 
use  of  symbolic  simulation  followed  by  boolean  tautology  checking  for  verifying  logic  circuits  [6]. 

The  rest  of  this  paper  is  organized  as  follows:  Section  2  introduces  a  notation  for  writing  proce¬ 
dural  performance  models  and  also  describes  a  procedure  for  the  partial  evaluation  of  procedural 
performance  models  given  partial  primitive  attribute  data.  Performance  models  written  using 
this  notation  can  be  easily  embedded  into  high  level  programming  or  hardware  description  lan¬ 
guages.  Additionally,  when  sufficient  primitive  data  is  available,  the  reduced  models  can  be 
rendered  in  the  equational  form.  Section  3  describes  our  performance  verification  technique, 
based  on  interval  mathematics,  for  equational  models.  Section  4  presents  experimental  results 
that  show  typical  partial  evaluation  and  verification  times  for  some  performance  models.  Section 
5  contains  concluding  remarks. 


2  Partial  Evaluation  of  Procedural  Performance  Models 

Conceptually,  a  performance  model  is  specified  by  augmenting  a  (possibly  hierarchical)  net-list 
with  attributes  and  attribute  evaluation  rules  [7,  8].  An  attribute  represents  some  design  pa¬ 
rameter  such  as  voltage,  power  consumption,  time  delay,  and  so  forth.  An  attribute  can  be 
either  primitive  or  computed.  Primitive  attributes  are  assigned  a  value  by  the  user,  whereas 
computed  attributes  are  defined  by  an  evaluation  rule  which  assigns  an  expression  to  the  at¬ 
tribute.  Evaluation  rules  can  use  many  different  forms  of  expressions  which  will  be  described  in 
the  following  paragraphs.  For  uniformity  of  presentation,  we  will  assume  that  all  attributes  are 
real  valued,  although  the  partial  evaluation  and  verification  techniques  presented  in  this  paper 
are  fully  capable  of  handling  integers  and  enumerated  types  including  booleans  and  bits. 

Figure  5  shows  an  algorithm  for  partially  evaluating  a  performance  model.  ASet  is  a  set  contain¬ 
ing  all  of  the  attributes  in  the  model.  Prior  to  partial  evaluation,  the  evaluation  order  of  all  the 
attributes  has  to  be  determined.  A  computed  attribute  has  an  expression  which  defines  how  to 
calculate  the  value  of  the  attribute.  This  expression  typically  depends  upon  the  value  of  other 
attributes  in  the  performance  model.  For  example,  if  two  rules  were  x  =  y  +  5  and  y  =  5,  the 
rule  for  x  could  not  be  evaluated  until  y  has  been.  An  attribute  which  depends  upon  no  other 
attribute  is  given  an  evaluation  order  of  1.  From  there,  each  attribute  expression  is  assigned  an 
evaluation  order  equal  to  one  plus  the  largest  evaluation  order  of  any  attribute  upon  which  it 
depends.  In  the  previous  example,  y  would  have  an  order  of  one  and  x  would  have  an  order  of 
two. 
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'  EVALUATE-MODEL(Aet) 

begin 

Determine  JSvaluationJDrder{ASet) 

Tstep  1 

while(7^ep  is  less  than  or  equal  to  the  largest  evaluation  order ) 
Oset  {All  attributes  in  Aset  with  order  equal  to  Tstep} 
for  each  A  in  Oset 

£  EvaluationJExpression{A) 

£  «—  Partza/Ez;aZ(£) 
end  for 

Tstep  T'step  "I"  1 

end  while 
end 


Figure  5:  Partial  Evaluation  Algorithm 

Using  the  evaluation  order,  each  attribute  in  the  model  is  evaluated  beginning  with  all  attributes 
of  order  1.  The  function  PartialEvalQ  then  performs  the  process  where  all  known  attributes 
values  are  replaced  in  each  evaluation  rule,  and  evaluation  rules  are  reduced  as  much  as  possible. 
An  attribute  is  considered  known  if  it  has  a  single  real  value.  An  attribute  with  an  evaluation 
rule  is  considered  unknown  until  the  evaluation  rule  can  be  evaluated  to  a  single  value. 

The  following  sections  describe  in  detail  how  to  partially  evaluate  the  various  constructs.  The 
constructs  discussed  in  this  section  are  available  in  virtually  all  high  level  procedural  program¬ 
ming  languages  and  hardware  description  languages.  Performance  models  can  be  directly  written 
using  such  languages,  or  alternatively,  such  performance  models  can  be  automatically  extracted, 
given  the  design  net-list,  from  generic  performance  models  written  in  a  performance  modeling 
language  such  as  PDL  [7].  Instead  of  selecting  an  existing  language,  we  use  this  general  notation 
to  emphasize  that  the  partial  evaluation  technique  described  in  this  paper  can  be  used  in  the 
context  of  performance  models  written  in  many  existing  languages. 

Mathematical  Expressions  :  Every  mathematical  expression  is  parsed  into  an  expression 
tree  with  nodes  in  the  tree  representing  operations,  real  values,  or  references  to  other  attributes. 
The  evaluation  process  recursively  traverses  the  tree  replacing  nodes  with  real  values  whenever 
possible. 

attr  =  unary- operator  (Vi  =  PartialEvalf expr) 

attr  =  (Vi  =  PartialEval(left-expr))  binary- operator  (V2  =  PartialEval( right-  expr)) 

If-Then-Else  Expressions  or  statements  :  As  shown  below,  each  part  of  the  if-then-else 
expression  is  evaluated  first.  When  the  conditional  expression  is  known,  the  entire  if-then-else 
expression  or  statement  can  be  replaced  by  either  the  true  or  false  branch,  depending  upon  the 
boolean  value  of  the  conditional  statement.  When  the  conditional  does  not  evaluate  to  a  known 
value,  the  only  operations  are  replacement  of  all  known  values  where  possible. 

attr  =  if  (Vi  =  PartialEval( conditional- expr))  then 
(V2  =  PartialEval  (true- expr)) 
else 

(V3  =  PartialEval(false-expr)) 

endif 
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Case  Expressions  or  statements  :  This  is  very  similar  to  the  the  if-then-else  expression 
or  statement.  All  expressions  within  the  case  expression  are  evaluated.  When  both  the  switch 
expression  and  matching  expression  are  known,  the  entire  case  statement  can  be  replaced  with 
the  corresponding  arm  expression  or  statement.  When  this  condition  does  not  occur,  only  values 
for  those  attributes  which  are  known  can  be  replaced. 

attr  =  case  (Vi  =  PartialEval (switch- expr))  of 

(V2  =  PartialEval  (match- expr))  :  (V3  =  PartialEval(arm-expr)) 

(V4  =  PartialEval  (match- expr))  :  (V5  =  PartialEval( arm- expr )) 

(V6  =  PartialEval  (match- expr))  :  (V7  =  PartialEval  (arm- expr)) 

others  :  (Vx  =  PartialEval(other-expr)) 
end  case 

Foreach  Expressions  or  statements  :  There  are  two  different  types  of  foreach  expressions  or 
statements.  One  type  of  foreach  contains  a  loop  variable  that  iterates  over  a  range  of  values  from 
one  value  to  another  value  by  a  specified  step  size.  When  the  left  and  right  range  expressions 
are  known,  the  foreach  expression  or  statement  can  be  unrolled  and  replaced  by  copies  of  the 
foreach  body  with  the  loop  variable  replaced  in  each  copy  with  the  respective  value.  When 
either  element  of  the  range  is  unknown,  only  references  to  known  attributes  in  the  foreach  body 
can  replaced. 

attr  =  foreach  var  in  (Vi  =  PartialEval(left-expr))  to 

(V2  =  PartialEval(right-expr))  by  (V3  =  PartialEval(step-size)) 

{  (V4  =  PartialEval(body-expr))  } 

The  other  type  of  foreach  expression  iterates  over  a  list  of  variables,  values,  or  combination  of 
both.  Partial  evaluation  here  is  similar  to  the  other  foreach  expression  or  statement. 

attr  =  foreach  var  in  iterate-list 

{  (Vi  =  PartialEval(body-expr))  } 


Begin-End  Sections  :  The  process  for  evaluating  the  begin-end  expression  begins  by  setting 
a  temporary  fail  flag  to  false.  Each  variable  declaration  statement  is  evaluated  along  with  the 
initial  value  if  there  is  one.  If  any  of  the  variable  declaration  statements  evaluate  to  unknown, 
the  fail  flag  is  set  to  true. 

Then  each  programming  statement  in  the  begin-end  expression  is  evaluated.  If  during  the 
evaluation  of  a  statement,  the  result  is  unknown,  the  fail  flag  is  set  to  true.  When  a  return 
statement  is  reached,  several  conditions  are  checked.  First,  the  return  expression  is  evaluated. 
If  that  value  is  known  and  the  fail  flag  is  still  false,  then  the  entire  begin-end  expression  can 
be  replaced  by  the  residual  return  expression.  However,  if  the  fail  flag  is  true,  that  means  a 
previous  statement  did  not  completely  evaluate  so  the  begin-end  expression  can  not  be  replaced. 

Function  Calls  :  Function  calls  are  the  most  complicated  expression  to  evaluate.  First,  a 
copy  of  the  function  body  (which  is  a  begin-end  expression)  is  made.  Then  variable  declarations 
are  added  to  the  top  of  the  copied  function  body.  For  each  argument  in  the  function  argument 
list,  a  declaration  is  made  for  that  variable  and  the  initial  value  is  set  to  the  value  being  passed 
to  the  function.  The  following  example  illustrates  this  process: 


attr  =  min-val(objl5val,  obj2’val) 
function  mm-val(a,  b) 
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attr  =  begin 

a  :=  objl’val 
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begin  b  :=  obj2’val 

end  end 

Once  the  function  call  is  replaced,  the  begin-end  expression  is  evaluated.  The  function  call  be¬ 
comes  equational  only  if  the  residual  return  expression  of  the  begin-end  expression  is  equational. 

By  specifying  the  appropriate  partial  primitive  data  for  the  performance  model,  all  expressions 
and  statements  in  the  performance  model  can  be  reduced  to  an  equational  form  during  evalu¬ 
ation.  In  the  case  where  all  primitive  data  is  supplied,  the  entire  performance  model  becomes 
evaluated  with  every  attribute  having  a  single  real  value.  This  is  full  evaluation  of  the  model 
and  does  not  require  verification. 

3  Verification  of  Performance  Models 

Performance  verification  is  the  problem  of  determining  whether  a  performance  model  can  simul¬ 
taneously  satisfy  a  set  of  relational  constraints  on  the  attributes.  Interval  mathematics  [9,10,11] 
provides  a  convenient  technique  to  represent  relational  constraints  as  intervals.  The  constraints 
are  specified,  the  interval  technique  is  applied,  and  a  verification  result  is  produced.  This  result 
is  in  the  form  of  a  statement  that  the  constraints  can  be  met  (“yes”),  or  they  cannot  be  met 
(“no”). 

However,  our  approach  is  limited  to  performance  models  that  contain  only  equations.  That  is, 
every  evaluation  rule  is  only  composed  of  the  mathematical  operators  such  as  +,  *,  /,  xy , 

negation,  exp(),  and  log(). 

Interval  Notation:  An  interval  is  a  tuple  of  the  form  [a,  b]  where  a  <  b.  It  denotes  the  set 
of  all  values  from  a  to  b,  both  inclusive.  A  relational  constraint  on  an  attribute  is  represented 
by  an  interval.  Figure  6  shows  the  interval  notation  for  each  type  of  relation  that  is  possible 
on  attribute  X.  A  set  of  constraints  can  be  imposed  on  a  single  attribute  with  the  union  of 
corresponding  intervals.  For  example,  the  constraint  X  <  4  or  X  >  6  would  be  written  as 
[-oo,4)  U  [6,oo]. 

With  given  a  performance  model,  relational  constraints  can  be  placed  on  various  attributes  in  the 
performance  model.  Relational  constraints  on  primitive  attributes  state  the  assumptions  about 
the  permitted  variance  in  the  operating  condition  of  the  performance  model  and  the  relational 
constraints  on  the  computed  attributes  state  the  desired  performance  goals. 

Initially,  each  attribute  is  assigned  an  initial  interval.  A  computed  attribute  with  an  equation  or 
a  primitive  attribute  with  no  user-specified  relational  constraints  has  initial  interval  of  [-00,00]. 


[c,c] 

[-00, c] 

[c,o°] 

[a,b] 

(a,b] 

[a,b) 

(a,b) 

[-00, c)  U  (c,oo] 


X  =  c 
X  <  c 
X  >  c 
a  <  X  <  b 
a  <  X  <  b 
a  <  X  <  b 
a  <  X  <  b 
X^c 


Figure  6:  Equivalent  Relation  and  Interval 
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a/b+c*d+5 


(  +i 


(5) 


Figure  7:  Example  Expression  Parse  Tree 


Attributes  that  have  a  constant  real  value  val  are  assigned  an  initial  interval  of  [val^val],  Any 
constant  value  appearing  in  an  equation  also  has  an  initial  interval  of  [val^val].  A  user  specified 
constraint  placed  on  an  attribute  replaces  the  initial  interval  for  the  attribute. 

Algorithm  for  Interval  Analysis:  To  make  the  explanation  of  the  algorithm  clearer,  we 
assume  that  the  attributes  have  only  a  single,  real- valued  interval  constraint.  A  companion  paper 
[12]  describes  how  a  variation  of  this  technique  can  be  used  to  incorporate  multiple  intervals 
(multiple  relational  constraints)  for  each  attribute  and  integer  intervals  (including  handling  of 
enumerated  range  intervals). 

Before  the  analysis  begins,  each  equation  is  parsed  into  an  expression  tree  (parse  tree).  Internal 
nodes  in  the  tree  axe  mathematical  operators  with  edges  pointed  to  to  either  one  or  two  child 
nodes  depending  on  whether  the  operator  is  unary  or  binary.  The  leaves  of  the  tree  are  either 
attribute  names  or  constant  values.  Figure  7  is  an  expression  tree  for  the  equation  x  =  a/b  + 
c  *  d  4-  5.  An  expression  parse  tree  for  the  entire  performance  model  is  generated  in  this  fashion. 
The  entire  performance  model  is  represented  as  a  forest  of  expression  trees. 

The  interval  analysis  algorithm  makes  repeated  use  of  two  basic  steps,  a  forward  interval  analysis 
step  followed  by  a  backward  interval  analysis  step.  In  the  forward  direction,  beginning  with  rules 
having  an  evaluation  order  of  1,  each  equation  is  evaluated  using  interval  mathematics.  Interval 
mathematics  define  how  each  operator  behaves  when  calculating  with  intervals.  Figure  8  shows 
each  mathematical  operator  and  how  to  determine  a  resulting  interval. 


addition 

subtraction 

multiplication 

division 

division 

minus 

exp() 

log() 

log() 

union 

intersection 


[a,b]  +  [c,d]  =  [a+c,  b+d] 

[a,b]  -  [c,d]  =  [a-d,  b-c] 

[a,b]  -  [c,dl  =  [min(a*c,  b*c,  a*d,  b*d),  max(a*c,  b*c,  a*d,  b*d)] 

[a,b]  /  [c,d]  =  {[a,b]  /  [c,  0)}  U  {[a,b]  /  (0,  d]}  when  [c,d  ]  contains 

[a,bj  /  [c,d]  =  [a,b]  *  [1/d,  1/c]  when  [c,d  ]  does  not  contain  zero 

-  [a,b]  =  [-b,  -a] 

exp([a,b])  =  [exp (a),  exp(b)] 

log([a,b])  =  [log(a),  log(b)]  when  a>  0 

log([a,bj)  =  UNDEFINED  when  b  <  0 

[a,6][c,d]  =  exp([c,d]  *  log([a,b]))  when  X  >  0 

[a,b]  U  [c,d]  =  [min(a,c),  max(b,d)] 

[a,b]  H  [c,dj  =  [max(a,c),  min(b,d)j 

Figure  8:  Mathematical  Operators  on  Intervals 
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-  [-100,100] 

•  +  : 


a ) 


[2,5] 


(  +  ) 


J' 
(b) 

[7,11] 


[-100,100] 

(7) 

[7,7] 


[2,5] +  [14,18]  =  [16,23] 

O,  [16,23]f|[-100,100]  =  [16,23] 

/>±<> 

(t)  h  [7,11] +  [7,7]  =  [14,18] 

[2,5]  4r  >  [14,18]A[— 100,100]  =  [14,18] 

{b)  • .7) 

[7,llT  “[7,7] 


Before  Forward  Analysis  After  Forward  Analysis 


Figure  9:  Forward  Interval  Analysis  Example 


(+) 


[5,11] 


[5,11] 

( +'■ . 


[5,11] -[-5,12]  =  [-7,16]  /  W(\  [5,11]  -  [-10,10]  =  [-5,21] 

■v  r  .  [—10, 10] '^[—7, 16]  =  [-7,10],'-'  W  V,  [_5’21]f  ;[_5’12]  =  [-5’12] 

.a;  ( +) [-5,12]  r ;)+{  * 

H°'101  (bD?)  t-5,12]  -  [7,7, . [-12,5]  ^l)[-5^l:^:40l;r5-521 

[-40,40]  [7,7] 


[-12,5]  H[-40,40]  =  [-12,5]  [-45,52]f  i[7,7]  -  [7,7] 

Before  Backward  Analysis  After  Backward  Analysis 

Figure  10:  Backward  Interval  Analysis  Example 


Forward  interval  analysis  of  an  equation  begins  by  traversing  the  expression  tree  from  the  leaves 
to  the  root.  The  intervals  at  the  leaf  nodes  are  passed  to  their  parent  nodes.  In  the  parent 
node,  the  appropriate  operator  is  performed  and  a  new  interval  is  created.  This  new  interval  is 
intersected  with  the  current  interval  at  that  node  to  produce  the  final  result.  This  process  is 
repeated  until  the  interval  at  the  root  of  the  tree  is  revised.  Figure  9  is  a  simple  example  that 
illustrates  forward  interval  analysis  on  an  expression  parse  tree.  Forward  (upward)  propagation 
of  intervals  constitutes  computing  the  parent  intervals  from  the  child  intervals. 

Each  equation  with  an  evaluation  order  of  1  is  evaluated  in  this  manner.  Next,  the  equations 
with  evaluation  order  2  are  analyzed,  and  this  process  continues  until  all  equations  have  been 
forward  analyzed.  If  at  any  time  an  empty  or  an  illegal  interval  is  generated,  all  analysis  stops. 
An  illegal  interval  is  an  interval  [a,b]  where  b  <  a.  This  “interval”  has  no  values  in  it  and  is 
considered  empty.  Once  an  interval  becomes  empty,  no  further  propagation  can  occur  because 
intersection  with  an  empty  interval  always  produce  an  empty  interval. 

The  occurrence  of  an  empty  interval  means  that  with  the  given  performance  model  can  not 
simultaneously  satisfy  all  constraints.  Thus,  analysis  stops  and  the  result  of  verification  is  that 
the  constraints  cannot  be  met;  that  is  the  system  of  constraints  can  not  be  satisfied  by  the 
model.  There  is  no  possible  assignment  of  values  to  the  primitive  attributes  within  the  specified 
ranges  that  would  meet  the  overall  performance  goals  as  stated. 

However,  after  all  equations  have  been  forwarded  analyzed  and  no  empty  intervals  were  gener¬ 
ated,  the  next  step  is  to  do  backward  interval  analysis.  In  backward  analysis,  the  expression 
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X  =  A  +  B  :A  =  X-B  and  B  =  X-A 

X  =  A  —  B  :  A  =  X  +  B  and  B  =  A-  X 

X  —  A*  B  :  A  =  X/B  and  B  =  X/A 

X  =  A  B  :  A  =  X  *  B  and  B  =  A/X 

X  =  -A  :  A  =  -X 

X  =  log(A)  :  A  =  exp(X) 

X  =  exp(A)  ;  A  =  log(X) 

X  =  AB  :  A  =  exp(log(X)/B)  and  B  =  log{X)/log(A) 


Figure  11:  Inverse  Calculations 


parse  trees  are  used  again.  However,  evaluation  starts  at  the  root  and  propagates  intervals  down 
the  tree  instead  of  up  the  tree  as  in  forward  analysis.  For  each  node,  a  new  interval  value  is 
calculated  using  the  current  interval  values  of  the  parent  node  and  the  sibling  node.  This  new 
interval  that  is  calculated  is  intersected  with  the  current  interval  value  at  that  node  to  obtain  a 
new  interval  value  for  that  node. 

To  calculate  a  new  interval  for  a  node,  the  inverse  of  the  operator  at  its  parent  node  must 
be  considered.  For  example,  suppose  there  is  an  addition  node  with  an  interval  X  and  two 
children  with  intervals  A  and  B.  In  the  forward  propagation  direction  the  expression  would 
be  X  =  A  4-  B.  However,  in  backward  propagation,  a  new  interval  is  calculated  for  A  using 
A  =  X  —  B  and  a  new  interval  for  B  is  calculated  as  B  =  X  —  A.  Each  node  has  the  computed 
interval  intersected  with  its  current  interval,  and  the  algorithm  traverses  the  expression  tree 
until  leaf  nodes  are  reached.  Figure  10  shows  an  example  of  backward  analysis  for  the  same 
expression  tree  in  Figure  9. 

Every  mathematical  operator  in  the  expression  trees  must  have  an  inverse  operator  for  backward 
analysis  to  work  correctly.  (This,  in  fact,  necessitates  the  restriction  that  this  technique  is 
applicable  to  invertible  equational  performance  models  only.)  Figure  11  shows  the  inverses  for 
each  operator  where  X  is  the  interval  of  the  current  node,  A  is  the  interval  of  the  left  child  and 
B  is  the  interval  for  the  right  child. 

Backward  analysis  continues  as  long  as  an  empty  interval  is  not  produced  and  until  all  equations 
have  been  backward  analyzed.  When  an  empty  interval  is  produced,  all  analysis  stops  and  the 
result  is  that  the  performance  model  is  unsatisfiable  with  the  given  set  of  constraints.  Otherwise, 
forward  and  backward  propagation  are  repeated  until  no  further  interval  changes  occur.  If  this 
happens,  the  constraints  are  satisfiable  (ie.  there  exists  a  set  of  values  which  when  applied  to 
the  model  will  produce  a  solution  in  the  desired  range). 

Figure  12  is  the  algorithm  for  the  entire  verification  process  with  forward  and  backward  analysis. 
Nset  is  the  set  of  all  nodes  in  the  expression  trees  for  all  expressions  in  the  performance  model. 
Note  that  the  algorithm  will  always  produce  a  result  of  either  satisfied  or  unsatisfied.  In  the 
case  that  that  an  empty  interval  is  generated  during  iteration,  the  algorithm  ceases  and  returns 
a  status  of  unsatisfied.  When  this  does  not  happen,  the  outer  while  loop  continues  to  iterate 
until  no  node  interval  changes  during  a  forward  and  backward  iteration.  In  theory,  it  is  possible 
that  this  may  never  happen.  However,  due  to  the  computer’s  finite  precision,  there  will  always 
be  an  iteration  where  no  change  occurs.  In  practice,  this  limit  on  the  precision  is  small  enough 
that  it  does  not  affect  the  results  in  a  practical  performance  modeling  situation. 
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VERIFY  JMODEL(A fset) 
begin 

Determine  Evaluation -Order  (ffset ) 

Done  4-  false 
while(j Done  is  false) 

Done  4-  true 

Tstep  1  /  *  Forward  Propogation*  / 

while(7^tep  is  less  than  or  equal  to  the  largest  evaluation  order ) 
0Set  4-  {All  nodes  in  Mset  with  order  equal  to  T$tep} 
for  each  J\f  in  Oset 
1  i-  GetJnterval(J\f) 

Itemp  4-  Per  form  JntervaLOperation(Jf) 

In  ew  4~~  I  fl  Itemp 
if (Zneiu  is  empty)  then 
return  Unsatisfied 
end  if 

if  (I  not  equal  to  Inew)  then 
Replace  Jnterval  (V,  Inew) 

Done  4-  false 

end  if 
end  for 

Tstep  4-  Tstep  +  1 

end  while 

Tstep  4-  largest  evaluation  order  /  *  Backward  Propogation*  / 
while(7^ep  >  0) 

0$et  4-  {All  attributes  in  Hset  with  order  equal  to  T$tep} 
for  each  J\f  in  0S€t 
1  4—  GetJnterval{N) 
li  4-  GetJLeftJChildJnterval(JT) 
la  4-  Get  .Right  -Child  Jnterval  (J\f) 

Itemp  r  4 —  Per  form Jnverse  Jnterval -OperationffL^li) 
ItempL  4 —  Per f ormJnverse Jnterval JDperationil.Ia) 
InewR  4  I  fl  ItempR 
.  InewL  4"  In  ItempL 
if  (I newL  or  InewR  is  empty)  then 
return  Unsatisfied 
end  if 

if(Ii,  not  equal  to  InewL)  then 
Replace-Left  J  nterval  (Af,  InewL ) 

Done  4-  false 
end  if 

if  (I r  not  equal  to  InewR)  then 
Replace-Right  Jnterval{N ,  InewR) 

Done  4-  false 
end  if 
end  for 


Tstep  4—  Tstep 


end  while 
end  while 
return  Statisfied 
end 


1 


Figure  12:  Verification  Algorithm 
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Constraints 

gl’capacitance  :  [10.0,  25.0] 
g2’capacitance  :  [5.0,  10.0] 
g3’capacitance  :  [5.0,  20.0] 

Results  Constraints  were  satisfiable 

ckt’power  :  [63.125,11660.2]  gl’power  :  [25,4687.5]  g3’power  :  [9.375,2812.5] 

g2’power  :  [12.5,1875]  g4’power  :  [16.25,2285.16] 

Figure  13:  First  Verification  Configuration 


g4’capacitance  :  [8.0,  15.0]  ckt’systemireq  :  [10.0,30.0] 

ckt’power  :  [0.0,12000.0] 
ckt ’volt age  :  [1.0, 5.0] 


Constraints 

gl’capacitance  :  [10.0,  25.0]  g4’ capacitance  :  [8.0,  15.0] 

g2’capacitance  :  [5.0,  10.0]  ckt’power  :  [0.0,2000.0] 

g3’capacitance  :  [5.0,  20.0]  ckt’voltage  :  [3. 3, 3.5] 

Results  Constraints  were  not  satisfiable 

ckt’power  :  []  gl’power  :  [816.75,3828.13] 

g2 ’power  :  [408.375,1531.25] 


ckt ’system  Jreq  :  [30.0,50.0] 


g3’ power  :  [306.281,2296.88] 
g4’power  :  [530.888,1866.21] 


Figure  14:  Second  Verification  Configuration 


4  Implementation  and  Results 

The  partial  evaluator  and  the  interval-analysis  based  performance  verification  tool  are  imple¬ 
mented  in  C+4-  on  Sun  Sparc  platforms.  In  the  first  subsection  below,  we  show  the  interval 
constraints  and  results  produced  by  verification  of  the  reduced  equational  performance  model 
shown  in  Figure  4.  Three  different  verification  exercises  for  this  model  are  presented  to  describe 
how  the  verifier  can  be  used.  The  second  section  shows  evaluation  and  verification  times  for  two 
different  performance  models  for  large  design  net-lists. 


4.1  Verification  of  the  Performance  Model  for  Power 

A  constraint  configuration  or  simply  configuration  specifies  the  relational  constraints  to  be  placed 
on  the  attributes  of  a  performance  model.  Figure  13  is  one  configuration  for  the  primitive  at¬ 
tributes  in  the  performance  model.  Additionally,  we  constrain  ckt’power  to  answer  the  question: 
with  the  given  primitive  attribute  constraints,  can  the  power  constraint  be  satisfied? 

The  equational  model  and  configuration  are  given  as  input  to  the  verifier  and  two  results  are 
produced.  First,  the  verifier  specifies  whether  or  not  all  the  constraints  were  satisfied.  In 
addition,  it  also  lists  all  the  attributes  and  their  last  calculated  interval  values  when  analysis 
finished.  For  the  configuration  in  Figure  13,  the  constraints  were  satisfiable.  Only  the  intervals 
that  were  different  from  the  original  configuration  axe  shown  here.  Notice  that  the  interval  for 
ckt’power  has  changed  from  the  interval  originally  specified. 

Figure  14  is  another  configuration  with  slightly  different  constraints.  In  this  case,  the  verifier 
shows  that  the  the  constraints  were  not  satisfiable.  Again,  only  those  intervals  which  are  different 
from  the  original  specification  are  shown. 

A  final  configuration  for  the  performance  model  uses  a  union  of  intervals  for  several  attributes. 
Figure  15  shows  the  intervals  separated  by  commas.  A  list  of  intervals  separated  by  commas  is 
equivalent  to  the  union  of  the  those  intervals.  This  configuration  was  shown  to  be  satisfiable 
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Constraints 

gl’capacitance  :  [10.0,  25.0] 
g2 ’capacitance  :  [5.0,  10.0] 
g3’capacitance  :  [5.0,  20.0] 


g4’capacitance  :  [8.0,  15.0],  [3. 0,3. 5] 
ckt’power  :  [0.0,2000.0] 
ckt’voltage  :  [3. 3, 3.3],  [3.0, 3.0] 


Results 
gl’power  : 
g2’power  : 
g3’power  : 
g4’power  : 


Constraints  were  satisfiable 


[675.2812.5] 
[337.5,1125] 

[253.125.1687.5] 
[438.75,1371.09] 


gl’capacitance  :  [10,25] 
g2 ’capacitance  :  [5,10] 
g3’capacitance  :  [5,20] 
g4’capacitance  :  [8,15] 


ckt’system_£req  :  [80.0,  90.0], [30. 0,50.0] 


ckt’power  :  [1704.37,2000] 
ckt’voltage  :  [3,3] 
ckt’systemJEreq  :  [30,50] 


Figure  15:  Third  Verification  Configuration 


by  the  verifier.  This  time,  those  attributes  which  have  a  union  of  intervals  are  shown  with  the 
interval  that  was  used  during  evaluation  to  produce  that  satisfiable  result. 


4.2  Execution  Times 

We  now  present  results  of  partial  evaluation  and  verification  times  for  larger  performance  models. 
The  first  performance  model  was  written  to  calculate  the  throughput  time  of  combinatorial 
circuits,  given  the  delay  times  of  each  of  the  gates.  A  program  was  written  that  generated  12 
different  large  combinatorial  circuits  containing  from  1  to  12,286  net-list  objects  (an  object  being 
a  single  module,  port,  or  net).  Using  PDL,  a  performance  model  for  calculating  throughput  rate 
was  generated  for  each  of  the  12  net-lists. 

Next,  each  net-list  was  partially  evaluated,  after  setting  the  data  arrival  time  at  input  ports  to 
’0  ns5,  to  produce  an  equational  model.  This  model  was  then  verified  with  a  set  of  constraints 
that  is  satisfiable.  The  same  net-list  was  again  verified  with  a  set  of  constraints  that  is  not 
satisfiable. 

Times  for  partial  evaluation  and  verification  were  measured  on  a  Sun  SPARCstation  20  contain¬ 
ing  256  megabytes  of  memory.  Figure  16  is  a  plot  of  all  the  times  for  the  12  different  net-lists. 
With  this  model,  it  is  clear  that  net-lists  with  fewer  than  1000  objects  took  an  insignificant 
amount  of  time  to  evaluate  and  verify.  However,  as  the  net-list  size  increased,  the  verification 
time  increased  significantly  for  the  satisfiable  constraint  set.  However,  unsatisfiable  constraints 
were  verified  with  a  negative  in  a  short  amount  of  time,  even  for  large  net-lists. 

As  a  second  example,  a  model  for  calculating  dynamic  power  in  CMOS  logic  circuits  was  used  for 
14  different  logic  circuits.  Net-lists  ranged  in  size  from  1  to  49,150  objects.  Again,  each  net-list 
was  partially  evaluated  to  produce  an  equational  model,  then  verified  with  a  set  of  satisfiable 
constraints  and  a  set  of  unsatisfiable  constraints.  Figure  17  shows  the  plot  of  the  times  for 
the  various  net-lists.  In  this  example,  verification  of  the  satisfiable  constraints  was  faster  than 
evaluation  and  verification  with  unsatisfiable  constraints. 


5  Conclusion 

This  paper  presented  a  partial  evaluation  technique  to  simplify  procedural  performance  models 
and  render  them  in  an  equational  form  in  which  they  can  be  subjected  to  formal  verification  using 
interval  analysis.  This  process  is  similar  to  the  use  of  symbolic  or  trajectory  evaluation  followed 
by  boolean  tautology  checking  for  formal  verification  of  logic  circuits  [13,  6].  Experimental 
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Figure  16:  Evaluation  and  Verification  Times  for  Delay  Model 
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Figure  17:  Evaluation  and  Verification  Times  for  Power  Model 
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results  show  that  both  the  partial  evaluation  and  interval  analysis  based  verification  techniques 
are  quite  fast  even  for  net-lists  contain  several  thousands  of  design  objects. 

We  are  currently  investigating  techniques  for  more  closely  integrating  partial  evaluation  and 
interval  propagation  and  for  partially  evaluating  and  verifying  models  that  contain  dynamic 
performance  attributes  that  assume  streams  of  values. 
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APPENDIX  J: 

Hierarchical  Behavioral  Partitioning  for  Multicomponent  Synthesis 


Affiliation:  EURO-DAC 


Abstract 

Packaging  technology  has  tremendously  improved 
over  the  last  decade.  Various  packaging  options  such 
as  ASICs ,  MCMs,  boards ,  etc.  should  be  well  explored 
at  early  stages  of  the  system- synthesis  cycle.  In  this 
paper  we  present  a  hierarchical  behavioral  partitioning 
algorithm  which  partitions  the  input  behavioral  speci¬ 
fication  into  a  hierarchical  structure  and  binds  all  ele¬ 
ments  of  the  structure  to  appropriate  packages  from 
a  given  package  library .  As  an  application  to  our 
partitioner,  we  integrated  the  partiiioner  with  a  high 
level  synthesis  tool  to  create  an  environment  for  mul¬ 
ticomponent  synthesis  and  hierarchical  package  design. 
We  provide  detailed  partitioning  algorithms  and  exper¬ 
imental  results. 

1  Introduction 

High  level  synthesis  converts  a  behavioral  specifica¬ 
tion  of  a  digital  system  into  an  equivalent  RTL  design 
(composed  of  a  data  path  and  a  finite  state  controller; 
the  data  path  is  a  composition  of  components  selected 
from  a  register-level  component  library)  that  meets  a 
set  of  stated  performance  constraints  [1,  2,  3].  This 
RTL  design  can  be  partitioned  into  multiple  segments 
to  realize  a  multichip  design.  Partitioning  RTL  de¬ 
signs,  however,  has  various  drawbacks:  (1)  Control 
lines  could  be  crossing  segment  boundaries;  (2)  Op¬ 
erators  could  be  shared  by  operands  in  different  seg¬ 
ments,  this  results  in  poor  performance  due  to  inter¬ 
chip  communication;  (3)  The  design  is  fixed  during 
synthesis  and  thus  there  is  very  little  scope  for  cir¬ 
cuit  transformations  to  improve  performance;  (4)  RTL 
designs  are  much  larger  than  their  behavioral  counter¬ 
parts,  thus,’  the  solution  space  increases  rapidly  with 
the  size  of  the  synthesized  behavior,  making  the  par¬ 
titioning  process  very  time  consuming;  and  (5)  Power 
estimation/measurement  for  RTL  designs  is  too  time 
consuming  and  not  viable  for  very  large  designs. 

Recent  efforts  in  system-level  synthesis  have  led  to 
the  development  of  high  level  synthesis  systems  that 
can  produce  multichip  digital  systems  [4,  5,  6].  These 
systems,  however,  do  not  consider  the  impact  of  pack¬ 
aging  on  high  level  synthesis  and  hence  designs  pro¬ 
duced  by  these  systems  cannot  efficiently  use  avail¬ 
able  high  performance  packaging  technology.  For  very 
large,  performance  critical  designs,  an  efficient  hier - 


Categories:  2.1,  4.5  and  2.8 


BoafdOMipi 


Figure  1:  Hierarchical  Behavioral  Partitioning 

archical  behavioral  partitioner ,  which  fully  explores 
various  packaging  options,  is  required  to  tackle  the 
drawbacks  of  RTL  partitioning.  The  inputs  to  the  Hi¬ 
erarchical  behavioral  partitioner  ate:  (1)  a  behavioral 
specification  to  partition;  (2)  parameterized  register 
level  component  library  characterized  for  area,  delay, 
and  switching  activity;  (3)  package  library  with  area, 
pins,  switching  activity,  clock  speed,  and  cost  infor¬ 
mation  for  all  packages;  and  (4)  cost  constraint  C,  in 
dollars  on  the  entire  design.  The  output  of  the  parti¬ 
tioner  is:  (1)  a  set  of  behavioral  specifications,  which 
together  form  the  original  specification;  (2)  a  set  of 
structures  that  realizes  the  hierarchical  design;  and 
(3)  a  binding  of  the  behavioral  specifications  and  the 
structures  to  appropriate  cost  effective  packages  from 
the  package  library. 

The  input  behavioral  specification  (which  may  be 
given  in  vhdl)  consists  of  a  set  of  communicating  and 
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concurrently  executing  processes.  This  specification  is 
internally  represented  as  a  process  graph;  with  nodes 
in  this  graph  representing  concurrently  executing  pro¬ 
cesses,  and  edges  being  communication  channels  Fig¬ 
ure  1  shows  a  process  graph  and  its  hierarchical  par¬ 
tition.  All  multiple-process  segments  in  the  figure 
marked  Part-i  are  behavioral  specifications  themselves 
and  can  be  synthesized  into  register  level  designs.  All 
these  register  level  designs  together  with  the  global 
controller  form  the  multicomponent  design.  The  hi¬ 
erarchical  design  mapped  onto  packages  shown  in  the 
board  design  forms  a  package  hierarchy  for  the  de¬ 
sign.  We  formulate  the  hierarchical  partitioning  prob¬ 
lem  and  and  propose  a  solution  for  the  hierarchical 
partitioning  and  package  binding  problem.  We  show 
how  our  partitioner  can  be  integrated  with  a  high 
level  synthesis  tool  to  create  an  environment  for  mul¬ 
ticomponent  synthesis  and  hierarchical  package  bind¬ 
ing.  Experimental  results  for  a  number  of  designs  are 
presented. 

2  Problem  Formulation 

Definitions  2.1  and  2.2  introduce  the  concept  of  a 
hierarchical  k-level  partition  of  a  set.  Definition  2.3 
extends  our  notion  of  a  k-level  partition  of  a  set  to  a 
k-level  partition  of  a  graph  G  =  (N,  E)  (which  in  our 
case  is  a  process  graph),  where  N  is  the  set  of  nodes 
and  E  is  the  set  of  edges. 

Definition  2.1  A  1-level  partition  of  a  set  Af  is  a  col¬ 
lection,  S ,  of  nonempty  sets  (called  segments),  such 
that 

•  S  is  a  collection  of  mutually  disjoint  sets,  i.e., 

if  C  €  S,  D  €  5,  and  C  ^  D,  then  C  H  D  =  <£,  and 

•  the  union  of  S  is  the  whole  set  Af,  i.e.,  \JS^S  $  =  .V\0 

Definition  2.2  A  k-level  partition ,  V ,  of  a  set  Af  is  a 
set  of  1-level  partitions  Pi,  P2, . . Pk  such  that 

•  for  1<:  <  k,  P,-+1  is  a  1-level  partition  of  Pi ,  and 

•  Pi  is  a  1-level  partition  of  Af.  Q 

Definition  2.3  A  k-level  partition  of  a  graph  G  = 
(N,  E )  is  a  k-level  partition  of  iV,  where  N  is  the  set 
of  nodes  and  E  is  the  set  of  edges.  D 

The  performance  attributes  of  the  nodes  in  the 
graph  G  and  level  1  partition  segments  (each  segment 
is  viewed  as  a  sub-graph  of  G  or  a  subset  of  processes 
in  the  behavioral  specification)  in  the  graph  are  de¬ 
termined  through  scheduling  and  performance  estima¬ 
tion  of  individual  nodes  or  segments  [12,  13,  15].  Thus 
for  any  segment ,  s  €  Pi,  the  performance  attributes 
A(s),  H(s),  T(s),  and  B(s)  (area,  switching  activ¬ 
ity,  clock  period  and  pin  count  respectively)  are  com¬ 
puted  by  the  performance  estimator  built  into  the  par¬ 
titioning  environment.  This  process  is  similar  to  the 


scheduling  and  performance  estimation  steps  in  high 
level  synthesis  [12,  15]. 

We  have  a  set  of  packages  Pi ,  P2>  Pz  ■  •  -Pn  in  a  pack¬ 
age  library  £.  Each  package  p  has  six  attributes:  A(p), 
the  area  capacity;  H(p ),  the  maximum  switching  ac¬ 
tivity;  T(p),  period  of  the  fastest  clock  allowed  by  the 
package;  B(p),  the  number  of  pins  available  in  p;  C(p), 
the  dollar  cost  of  p;  and  L(p)  >  1  is  the  level  number 
of  the  package  p.  Level  of  a  package  is  the  level  in  the 
packaging  hierarchy  at  which  the  package  can  be  used. 
All  bare-die  packages  are  level  one,  ASICs  and  MCMs 
are  level  two,  boards  are  level  three,  and  so  on.  The 
defining  level  of  a  library  is  the  smallest  k  such  that 
no  package  in  the  library  has  level  greater  than  fc.  For 
i  >  1,  packages  with  level  i  can  contain  only  packages 
with  level  i  —  1  and  level  1  packages  contain  the  nodes 
and  segments  of  the  process  graph.  The  hierarchical 
partitioner  assigns  a  package  p  €  £  to  each  partition 
segment  in  Pi,  P2,  •  •  -Pk  €  *P.  All  packages  can  be  in¬ 
stanced  multiple  times,  that  is,  two  different  segments 
can  be  assigned  the  same  package  type.  All  segments 
in  Pj,  called  the  level  i  segments ,  can  be  assigned  only 
to  a  package  of  level  If  p  and  q  are  two  package 
instances  then,  p  <  q  denotes  ‘p  contains  q\ 

Definition  2  A  For  any  instance,  p,  of  a  package  from 
the  package  library  £: 

If  2  <  L(p)  <  k: 

(a)  area  cost  of  the  package  a(p)  =  a($) 

(b)  heat  cost  of  the  package  h(p )  =  ^  h(q) 

(c)  pin  cost  of  the  package  6(p)  = 

y;  e,  e  spans  package  instances  pa  and  p&;  such  that: 
c€*  (£(pa)  =  L(pb)  =  Lip)  -  I)  A  (p<  Pa)  A  (p  *  Pb) 

(d)  clock  period  cost  t(p)  =  mazp^?(t(g)) 

When  L{p)  =  1,  the  scheduler  and  performance  es¬ 
timator  will  determine  the  above  costs  based  on  the 
level  1  segment  in  p.  □ 

Hierarchical  Partitioning  Problem:  Given  a 
process  graph,  G  =  (N,  £),  a  package  library  £  with 
defining  size  k,  and  a  cost  constraint  C: 

•  find  a  (k-l)-level  partition  V  =  {Pi,  Pz>  •  •  •»  Pfc-i} 
of  G 

•  Let  Pk  =  { sk };  where ,  $*.=  {st-i  |  sk-i  €  P*-i} 
that  is,  Pi  contains  exactly  one  segment  (which  in  turn 
contains  all  the  segments  in  P;*-i)  to  be  mapped  to  a 
top  most  level  package  in  the  library. 

•  Now  find  a  binding,  S,  which  for  I  <  i  <  &.  binds 
each  segment  in  Pi  to  some  level  i  package  instance 
from  £,  such  that 
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for  each  instance,  p,  of  any  package  from  C: 

«(p)  <  Mp), 

Kp)  <  H{p)t 
Kp)  <  Hip), 
t{S)>Tip). 
subject  to 

Cost(7>)  =  C(p);  Cost(V)  <  C.  □ 

instance  p 

3  The  Behavior  Level  Hierarchical 
Partitioning  Algorithm 
The  algorithm  begins  by  partitioning  the  process 
graph  and  mapping  partition  segments  onto  available 
bare-die  packages.  A  graph  is  constructed  from  the 
partition  generated  at  this  level  for  further  partition¬ 
ing  at  the  next  higher  level  of  packaging.  The  pack¬ 
aged  partition  segments  form  nodes  in  the  new  graph; 
edges  of  the  current  graph  which  connect  nodes  in  dif¬ 
ferent  segments,  form  the  edges  of  the  new  graph.  At 
the  next  higher  level  of  packaging,  this  new  graph  is 
partitioned  and  mapped  onto  packages.  This  process 
continues  until  the  packaging  hierarchy  is  exhausted 
and  at  each  level,  partition  segments  are  mapped  onto 
cost  effective  packages.  If,  at  a  particular  level,  no 
solution  is  found,  we  back-track  to  the  previous  level, 
tighten  cost  constraints,  and  construct  a  new  parti¬ 
tion  and  continue.  Various  steps  in  the  algorithm  are 
explained  below. 

Setting  Constraints:  Initially,  on  the  first  pass, 
overall  area  and  switching  activity  constraints  for  the 
entire  design  are  set  to  the  minimum  area  and  switch¬ 
ing  activity  capacity  of  packages  at  the  highest  level 
in  the  package  hierarchy  (since,  eventually,  the  design 
hierarchy  needs  to  be  mapped  onto  a  package  at  the 
topmost  level  in  the  package  hierarchy).  The  cost  con¬ 
straint  is  set  by  subtracting  the  cost  of  the  smallest 
package  at  all  levels  of  packaging  above  level  1  from 
the  total  cost  constraint,  C.  On  subsequent  invoca¬ 
tions,  if  the  algorithm  is  back-tracking,  a  cost  overrun 
is  computed.  If  the  cost  overrun  is  less  than  the  cost  of 
the  previous  level’s  packaging,  cost  constraint  for  the 
previous  level  (on  a  back-track)  is  set  by  subtracting 
the  product  of  cost  overrun  and  a  cosi  overrun  fac¬ 
tor  (COF  <  1)  from  the  cost  of  the  previous  level’s 
packaging.  On  the  other  hand,  if  the  cost  overrun  is 
greater  than  the  cost  of  the  previous  level’s  packaging, 
cost  constraint  for  the  previous  level  (on  a  back-track) 
is  set  by  multiplying  the  cost  of  the  previous  level’s 
packaging  by  a  constraint  tighten  factor  (CTF  <  1). 
COF  and  CTF  dictate  the  rate  at  which  the  cost  con¬ 
straint  is  tightened  on  a  back-track.  Typical  values  of 
COF  are  between  0.2-0. 3  and  CTF  between  0.9-0-95 
to  enable  effective  search  of  the  design  space.  If  the 
algorithm  is  not  back-tracking,  cost  constraint  is  gen¬ 


erated  by  subtracting  the  actual  cost  of  packaging  at 
lower  levels  of  packaging  and  the  projected  packaging 
cost  at  higher  levels  (cost  of  smallest  packages)  from 
the  total  cost  constraint,  C . 

Hierarchical  Partitioning  and  Package  De¬ 
sign  (HPP):  Algorithm  3.1  presents  the  hierarchi¬ 
cal  partitioning  and  package  design  algorithm  (hpp). 
HPP  has  access  to  a  multiway  partitioning  algorithm 
(MP  -  Algorithm  3.2).  When  partitioning  at  any  level, 
HPP  first  determines  cost,  area,  and  switching  activity 
constraints  using  Set_Constraint  and  then  MP  is  in¬ 
voked.  MP  explores  the  design  space  by  constructing 
a  set  of  alternative  partitions;  MP  returns  the  first  par¬ 
tition  that  satisfies  constraints,  or,  in  the  absence  of 
a  constraint  satisfying  solution,  returns  the  best  cost 
solution  from  the  set  of  partitions. 

MP  returns  a  status  flag  along  with  a  solution  (par¬ 
tition  with  segments  bound  to  packages).  Status  takes 
three  values  of  SUCC,  BEST,  or  FAIL  to  describe  the 
cases  where  a  constraint  satisfying  solution  is  found  (a 
constraint  satisfying  partition  with  partition  segments 
mapped  onto  packages  from  the  package  library),  a 
solution  is  found  (valid  partition  -  a  partition  with 
segments  mapped  onto  packages,  but  does  not  satisfy 
constraints),  or  no  solution  is  found  (no  valid  partition 
-  one  or  more  partition  segments  cannot  be  mapped 
onto  packages).  Based  on  the  values  of  the  status  flag 
for  the  current  and  previous  levels,  HPP  decides  to 
proceed  to  the  next  higher  level,  back- track  to  previ¬ 
ous  level  or  terminate  reporting  failure.  A  hierarchi¬ 
cal  netlist  manager  (hn)  is  used  to  generate  a  netlist, 
of  the  newly  generated  partition,  for  use  at  the  next 
higher  level. 

Multiway  Partitioning  Algorithm  (MP):  MP 
(Algorithm  3.2)  is  built  on  top  of  a  K-way  extension 
of  the  Fiduccia-Matthevses  algorithm  (kway  -  Algo¬ 
rithm  3.3)  [11,  14].  MP  first  determines  the  minimum 
and  maximum  number  of  segments  that  feasible  par¬ 
titions  can  have  and  invokes  the  kway  algorithm  to 
generate  partitions  in  the  feasible  range.  MP  returns 
with  status  SUCC  if  a  constraint  satisfying  partition 
is  found.  When  a  constraint  satisfying  solution  is  not 
found,  MP  returns  the  best  solution  found  with  status 
BEST.  In  the  case  of  no  valid  partitions  (one  or  more 
partition  segments  cannot  be  packaged),  MP  returns 
FAIL. 

K-wav  FM  Algorithm. (KWAY):  Our  k-way  ex¬ 
tension  of  the  FM  algorithm  (kway  —  Algorithm  3.3) 
starts  by  creating  a  random  initial  partition  of  k 
segments,  k-way  partitioning  is  carried  out  by  re¬ 
peatedly  invoking  two-way  FM  ( two.way^fm)  on  pairs 
of  partition  segments.  hvo.wayjm  tries  to  im¬ 
prove  bi-partitions  by  moving  one  node  at  a  time 
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Algorithm  3.1  (HPP  Algorithm:  HierPartPack) 
G:  input  graph  (Behavioral  specification] 

P:  package  set 

C:  overall  cost  constraint  on  design 
HN:  hierarchical  netlisi  manager 
StatArrfk],  BtkArrfk]:  status  of  partitioning  and  num¬ 
ber  of  back-tracks  at  each  level 
MaxBtk:  User  specified  limit  on  number  of  back-tracks 
at  any  level 

k:  levels  in  package  hierarchy ,  level:  current  level 
area:  overall  area  constraint 
switch:  overall  switching  activity  constraint 
cost:  cost  constraint  at  current  package  level 

HierPartPack(G,  P,  C) 

begin 

level  «—  1  Guvtj  G  Solution  <—  null 
while  level  <  k  do 
Set.Constraini() 

(status,  Solution)  * —  MP(Gteve{,  P (level),  cost, 
area ,  switch,  level) 

StatArrfk]  status 
case  status  is 
SUCC: 

level  level  +  1 
HN  ::  read.partiiion( Solution) 

HN ::  construct.netlisi(level) 

BEST: 

if  ((SiaiArrflevel  -  i/=  SUCC)  A 
(BtkArrfk]  <  MaxBtk))  then 
BtkArrfk]  <—  BtkArrfk]  -f  1 
level  —  level  -  1  /*  back-track  */ 

else 

level  level  4*  1 
HN  ::  read-partition] Solution) 

HN  ::  construcLnetlist(level) 
end  if 
FAIL: 

if  ( ( StaiArrflevel  -  1]  =  SUCC)  A 
(BtkArrfk]  <  MaxBtk))  then 
BtkArrfk]  <—  BtkArrfk]  +  1 
level  level  —  1  /*  back-track  */ 

else 

return  (null/ 
end  if 
end  case 

Gitvei  <—  HN  ::  read.netlisi(level) 

/*  retrieve  next  level  netlist  */ 

end  while 
return  (Solution) 
end 


Algorithm  3,2  (Multiway  Partitioning  Algorithm) 

G:  input  graph,  P:  package  set 

p:  individual  package  from  P 

area:  overall  ana  constraint 

switch:  overall  switching  activity  constraint 

C:  cost  constraint  on  design 

level:  level  in  package  hierarchy 

MP  ( Gy  P,  C,  area,  switch,  level) 
begin 

min.seg  m<xz(area/max_area(p), 

swi  tch/max-s  witdi(p) ) 

max.seg  <—  num.cell(G)  /*  #  of  nodes  in  graph  */ 
best.cost  <—  oo  status  FAIL 
Solution  null 

for  num.seg  =  min.seg  to  max.seg  do 
Best  KWAY(G,  P,  numseg,  level) 

/*  generate  first  partition  */ 
num.fm.it e  1  num.fmJkmp  <—  1 

status  <—  check.constraint(Besi,  area,  switch,  C) 
while  (status  ^  SUCC  A 
num.fm.iie  <  MAXJMJTE  A 
num.fm.imp  <  MAXJMJMP)  do 
S  «—  KWAY(G,  P,  num^seg,  level) 
status  +—  check.consiraint(S) 
num.fm.iie  num.fm.iie  -f-  1 
best.costJcway  cost] Best) 

if  (status  =  SUCC)  V  ((status  =  BEST)  A 
fcostfSj  <  bes’LcostJnnay ) )  then 
Best  —  5 
end  if 

if  (cosi(S)  <  best.cost.kway)  then 
num.fm.zmp  —  1 
else 

num.fm.imp  —  num.fm.imp  +  1 
end  if 
end  while 

if  status  =  SUCC  then 
return  (status,  Best) 
elsif  (status  =  BEST)  A  (cosi(Best)  < 
then 

5o/tthon  —  Best 
best.cost  —  cos*(i?esi/ 
end  if 
end  for 

returnfstalus. -Solution) 
end 
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Algorithm  3.3  (k-way  FM  Algorithm:  KWAY) 
G:  graph  G  =  (VfE)f  V  is  a  set  of  vertices  and  E  is  a 
set  of  edges 

P:  set  of  packages ,  S:  {s\ ,  52,  -  *  • ,  sn  }  a  partition  of  G 
with  k  segments 

KWAY(G,  P,  k ,  level) 
begin 

Best  <—  initialize ()  /*  create  initial  partitions  */ 

if  level  =  1  then  /*  pure  behavior  specification 

-  estimate  attributes  */ 
for  all  5  €  Best  do 

Schedule/ Performance  Estimate  s 
and  generate  A(s),  H(s),  B(s),  and  T(s) 
end  for 
end  if 

best-cost  <—  0  S  null  cont-part  <—  TRUE 
ite-cnt  1  implant  +—  1 
for  all  5  €  Best  do  /*  map  partition  segment 
to  package  and  find  cost  */ 
best-cost  +—  best-cost  +  cost(B(s)) 
end  for 

while  cont-part  =  TRUE  do 
for  i  =  1  to  k—1  do 
for  j  =  H-2  to  £  do 
two-way-fm($i ,  s;- ) 
end  for 
end  for 

if  level  =  7  then  /*  pare  behavior  specification 

-  esiimato  uttniutos  */ 
for  all  s  £  S  do 

Schedule /Performance  Estimate  s 
and  generate  A(s),  H(s),  B(s),  and  T(s) 
end  for 
end  if 

curr-cost  «—  0 

for  all  5  6  S  do  /*  map  partition  segment 
to  package  and  find  cost  */ 
curr-cost  —  curr-cost  -j-  cost(B(s)) 
end  for 

ite-cnt  ite-cnt  4-  1 
if  curr-cost  <  best-cost  then 
imp-cnt  «—  1  Best  *—  5 
/*  save  best  partition  seen  so  far  */ 
else  imp-cnt  *—  imp.cnt  +  1  end  if 
if  iie-cnt  =  MAXJTE  V  imp-cnt  =  IMP-CNT 
then  cont-part  FALSE  end  if 
end  while 

return /*  retrieve  best  partition  */ 
end 


from  one  partition  segment  to  the  other,  taking 
care  not  to  violate  area  and  switching  activity  con¬ 
straints.  The  iwo-way-fm  algorithm  is  based  on  Fiduc- 
cia  and  Mattheyses’s  bi-partitioning  algorithm  [11]. 
two-way-fm  is  invoked  until,  either  a  user  specified 
limit  on  number  of  total  iterations  is  exceeded,  or  a 
user  specified  limit  on  number  of  iterations  over  which 
partition  cost  does  not  improve  is  exceeded.  The  best 
cost  solution  found  during  the  iterations  is  returned 
as  the  k-way  partition. 

Scheduling  and  Performance  Estimation:  To 
evaluate  the  cost  of  level  1  partition  segments,  the  K- 
way  FM  invokes  the  scheduler,  which  also  estimates 
the  performance  attributes.  Scheduling  is  the  first 
important  step  in  the  high  level  synthesis  process. 
The  input  behavioral  specification  is  converted  into 
an  equivalent  data  flow  graph  (dfg)  representation. 
Scheduling  operates  on  the  DFG.  DFG  operations  are 
assigned  to  specific  control  steps  and  are  bound  to 
physical  ALUs  available  in  the  component  library.  The 
output  of  scheduling  is  a  time-stamped  and  partially 
bound  data  flow  graph,  that  satisfies  specified  con¬ 
straints.  Scheduling  determines  execution  speed  of  the 
synthesized  design  in  terms  of  clock  speed  and  number 
of  clock  cycles  required  to  execute  all  operations.  For  a 
given  parameterized  component  library,  we  can  com¬ 
pute  the  area,  average  switching  activity,  and  clock 
speed  costs  from  the  schedule  produced  by  the  sched¬ 
uler.  An  implementation  of  Paulin 's  force-directed  list 
scheduling  [9],  extended  for  communicating  and  con¬ 
currently  executing  processes  [8],  is  used.  Switching 
activity  estimation  technique  has  been  reported  in  [7]. 

4  Multicomponent  Synthesis 

Multicomponent  synthesis  is  carried  out  by  synthe¬ 
sizing  individual  partition  segments  at  level  1.  Fig¬ 
ure  2  demonstrates  how  we  integrate  our  hierarchi¬ 
cal  partitioning  environment  with  a  high  level  synthe¬ 
sis  system  to  produce  multicomponent  designs  with 
packaging  hierarchy.  We  call  this  integrated  sys¬ 
tem,  MSS  (Multicomponent  Synthesis  System)  [10]. 
Design  tradeoffs  are  performed  by  considering  vari¬ 
ous  partitions  and  carrying  out  scheduling  and  per¬ 
formance  estimation  on  proposed  partition  segments. 
The  performance  attributes  of  the  synthesized  RTL  de¬ 
signs  are  determined  and  compared  against  the  capac¬ 
ity  and  cost  constraints  imposed  by  the  packages  they 
are  bound  to.  Also,  a  global  controller  is  automati¬ 
cally  placed  on  a  partition  segment  and  interconnected 
with  the  RTL  design  segments.  The  global  controller  is 
placed  on  a  partition  segment  whose  package  has  the 
most  space  to  fit  the  controller.  Details  of  the  con¬ 
troller  model  to  support  multicomponent  partitioning 
are  discussed  in  [13,  14,  16]. 
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Figure  2:  Hierarchical  Behavioral  Partitioning 
for  Multicomponent  synthesis 


At  the  end  of  multicomponent  synthesis  and  hi¬ 
erarchical  package  design  we  have  a  multicomponent 
design  composed  of  interacting  RTL  design  segments. 
The  behavioral  partitioning  phase  produces  multiple 
behavior  segments  that  are  completely  synthesized  to 
RTL  designs  using  a  high  level  synthesis  system  such  as 
DSS  [12, 13].  Also  produced  is  a  hierarchical  structural 
design  (the  leaf  nodes  in  this  design  are  the  individ¬ 
ual  RTL  designs)  that  is  mapped  onto  efficient  cost- 
effective  packages  from  a  package  library.  We  func¬ 
tionally  validate  our  approach  by  simulating  the  hi¬ 
erarchical  RTL  design  and  the  input  behavior  for  the 
same  set  of  test  vectors  and  comparing  their  outputs. 

5  Results 

We  present  results  for  a  number  of  examples  to 
demonstrate  the  validity  of  our  behavioral  partitioning 
approach  for  multicomponent  synthesis  and  hierarchi¬ 
cal  package  design.  Details  of  our  package  library  is 
shown  in  Table  1.  Data  about  area,  pin,  switching  ac¬ 
tivity,  and  clock  speed  constraints  supported  by  each 
package  and  the  cost  of  the  package  are  presented. 
Table  2  presents  details  of  the  number  of  lines  of  code 
in  behavior  level  VHDLspecification  and  the  number  of 
processes  for  each  of  our  examples. 

Move  Machine:  The  instruction  set  of  the  Move 
Machine  controls  instruction  and  data  flow.  It  does 
not  compute  any  data  values.  ALU  operations  are 
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Table  1:  Package  Alternatives 

assumed  to  be  memory  mapped.  Fifo:  Fifo  models 
a  producer  consumer  problem.  Shuffle:  The  Shuffle 
is  a  high  speed  reconfigurable  32  bit  shuffle-exchange 
network  for  parallel  signal  processing.  The  Shuffle 
exchange  is  a  commercial  product  of  Texas  Instru¬ 
ments,  Inc.  dyn  is  a  five  process  description  that 
monitors  and  maintains  the  dynamic  length  and  maxi¬ 
mum  length  to  which  a  queue  in  a  producer-consumer 
problem  grows,  alu  is  a  nine  process  description  of 
an  arithmetic  logic  unit.  dynl-dynlO  and  alul-aluo 
are  multiple  processing  elements  generated  by  making 
multiple  instantiations  of  dyn  and  alu  respectively. 

5.1  Multicomponent  Synthesis  and  Hier¬ 
archical  Package  Design 

Tables  3  and  4  present  results  of  multicomponent 
synthesis  and  hierarchical  package  design  for  the  de¬ 
sign  examples  in  Table  2  with  the  package  library 
shown  in  Table  1.  For  the  smaller  examples  (Move  Me 
-  dyn2),  Table  3  presents:  (1)  number  of  processes;  (2) 
hierarchical  partition  segments  mapped  onto  packages 
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Table  2:  Design  Data  for  Examples 


from  the  package  library  (at  level  1,  partitioning  of 
processes  into  segments,  synthesized  eventually  into 
RTL  designs);  (3)  actual  number  of  back- tracks  by  the 
hierarchical  partitioning  and  package  design  algorithm 
and  the  limit  on  number  of  back- tracks  (BTK);  (4)  ac¬ 
tual  cost  of  the  design  and  the  cost  constraint;  and  (5) 
execution  time.  With  a  large  number  of  processes  it  is 
difficult  to  present  assignment  of  processes  to  partition 
segments,  hence  for  dyn3  -  dyn4,  Table  3  presents  the 
number  of  processes  on  each  level  1  partition  (instead 
of  presenting  individual  partitions).  With  an  even 
larger  number  of  processes,  it  is  difficult  to  present 
even  details  of  level  2  partition  segments.  Thus,  Ta¬ 
ble  4  presents  the  following  data  for  all  designs  in  Ta¬ 
ble  2:  (1)  number  of  processes;  (2)  number  of  back- 
tracks/BTK;  (3)  actual  cost/constraint;  and  (4)  exe¬ 
cution  time. 

For  each  example,  the  cost  constraint  was  progres¬ 
sively  tightened  until  the  algorithm  failed  to  find  a 
cost-satisfying  solution.  In  all  cases,  if  a  constraint- 
satisfying  solution  existed,  it  was  discovered  by  the 
algorithm.  This  was  verified  by  manual  examination 
of  the  examples.  The  results  establish  the  validity  of 
the  algorithm.  An  interesting  observation  that  vindi¬ 
cates  our  choice  of  the  back-tracking  algorithm  is  that 
in  all  our  examples  the  most  times  the  algorithm  ever 
back-tracks  is  three  (Table  4).  This  is  because  the  al¬ 
gorithm  back-tracks  only  if  it  can  potentially  find  a 
solution  with  better  cost  and,  also,  the  algorithm  con¬ 
verges  to  a  constraint-satisfying  solution  fairly  rapidly. 

Multicomponent  Synthesis  vs  Hierarchical 


RTL  Partitioning:  We  also  developed  a  Hierarchical 
RTL  partitioner  [14]  as  an  alternative  approach.  Here, 
we  synthesize  the  input  behavior  and  the  partition  the 
resultant  RTL  design.  Table  5  presents  a  comparison 
of  hierarchical  behavioral  partitioning  and  hierarchi¬ 
cal  RTL  partitioning  approaches.  Blanks  indicate  that 
the  input  design  was  too  large  to  be  handled  by  the 
RTL  partitioner.  For  each  example,  the  better  dol¬ 
lar  cost  solution  is  bold-faced.  RTL  partitioning  yields 
better  designs  for  smaller  examples  where  the  number 
of  synthesized  RTL  components  is  relatively  small  (< 
200).  For  larger  examples  multicomponent  synthesis 
clearly  out-performs  RTL  partitioning  in  the  quality  of 
solutions.  Also,  the  time  taken  by  RTL  partitioning  is 
more  than  the  time  taken  by  multicomponent  synthe¬ 
sis  by  an  order  of  magnitude  (two  orders  of  magnitude 
for  larger  examples  -  alu4,  dyn3). 

Hierarchical  Package  Design  without 
Scheduling:  Since  scheduling  and  performance  es¬ 
timation  are  time  consuming,  we  modified  HOP  and 
KWAY  by  replacing  the  schedule  and  performance  esti¬ 
mation  steps  by  approximations  for  area  and  switch¬ 
ing  activity.  In  this  approach,  individual  processes  are 
first  scheduled  and  performance  estimated.  Then,  for 
level  1  segments,  the  area  and  switching  activity  costs 
of  the  individual  processes  in  the  segment  are  summed 
to  obtain  the  total  area  and  switching  activity  of  the 
overall  segment.  These  numbers  are  then  adjusted  by 
a  small  percentage  (10-30%)  to  take  into  account  the 
possible  sharing  of  resources  if  the  processes  had  been 
actually  scheduled  together[14].  Table  6  presents  re¬ 
sults  of  hierarchical  partitioning  and  package  binding 
with  and  without  an  integrated  scheduling  and  perfor¬ 
mance  estimation  step.The  better  dollar  cost  for  each 
example  is  bold-faced.  Invalid  indicates  that  at  least 
one  of  the  partition  segments  at  level  1  does  not  fit 
on  available  packages;  thus,... the  design  is  not  valid. 
The  approach  with  scheduling  out-performs  the  ap¬ 
proximation  method,  especially  for  the  larger  designs. 
However,  (a)  execution  time  for  the  approximation 
method  is  very  small;  and  (b)  the  estimated  cost  of 
packaging  the  designs  are  fairly  close  to  the  solution- 
sreported  by  the  algorithm  with  embedded  scheduling 
algorithm.  Thisobservation  indicates  that  the  approx¬ 
imation  algorithm  should  be  usedto  quickly  generate 
approximate  dollar  cost  constraints  to  be  imposed  on 
the  rigorous  algorithm. 

6  Conclusions  and  Discussion 

We  have  presented  a  hierarchical  behavioral  par¬ 
titioning  and  package  design  algorithm.  We  demon¬ 
strated  a  methodology  to  integrate  our  partitioner 
with  a  high  level  synthesis  tool  to  create  a  multicom¬ 
ponent  synthesis  and  hierarchical  package  design  en- 


i 


Table  3:  Multicomponent  Synthesis  with  Hierarchical  Package  Design  Results 
Note:  s-p  denotes  the  mapping  of  segment  s  onto  package  p  from  the  package  library.  Also,  at  level  lt  number  of 
processes  on  each  partition  segment  are  presented. 
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Example 

No.  of 
Procs 

Num  BkTrk/ 
BTK 

Cost/ Constraint 
(S) 

dyn5 

25 

0/10 

8350/8000 

349.5 

alu3 

27 

0/10 

12700/8000 

579 

dyn6 

30 

1/10 

9850/9000 

1470.7 

ESS9HI 

35 

2/10 

11200/10000 

3141 

alu4 

36 

3/10 

14100/15000 

1549.4 

dyn8 

40 

1/10 

11850/12000 

1863.5 

dyn9 

45 

1/10 

13800/13000 

3684.1 

alu5 

45 

o 

«— t 

(M 

17750/18000 

1626.4 

dynlO 

.  50 

2/10 

|  16850/15000 

|  6452.2 

Table  4:  Multicomponent  Synthesis  and  Package  Design  Results  (Contd  ...) 


Table  5:  Behavioral  Partitioning  vs  RTL  Partitioning  approaches 


Table  6:  Multicomponent  Synthesis:  With  vs  Without  Scheduling 
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vironment,  MSS  (Multicomponent  Synthesis  System) 

[10].  MSS  takes  as  input  a  multi  process  VHDL  be¬ 
havior,  a  parameterized  component  library,  a  package 
library,  and  an  overall  cost  constraint  on  the  design 
and  generates  a  hierarchical  RTL  design  while  simulta¬ 
neously  constructing  a  physical  package  hierarchy  for 
the  design. 

We  presented  results  to  evaluate  the  performance 
of  the  approach  with  respect  to  the  quality  of  de¬ 
signs  produced  and  execution  times  for  a  number  of 
design  examples.  Hierarchical  RTL  partitioning  and 
package  design  yields  good  results  for  examples  where 
the  number  of  RTL  components  in  the  synthesized  de¬ 
sign  are  less  than  200.  When  partitioning  at  the  RTL 
netlist  level,  the  design  architecture  is  frozen  (during 
high  level  synthesis).  Alternate  multichip  designs  can¬ 
not  be  explored  during  hierarchical  RTL  partitioning, 
whereas  MSS  explores  the  design  space  by  considering 
alternate  implementations  during  high  level  synthesis. 
Also,  thermal  profiling  of  RTL  designs  is  too  time  con¬ 
suming  and  is  not  viable  for  large  designs.  For  almost 
all  the  examples,  MSS  produces  better  results  and  ex¬ 
ecutes  much  faster  than  the  hierarchical  RTL  partition¬ 
ing.  For  smaller  designs,  scheduling  overhead  can  be 
reduced  through  approximate  estimation  procedures 
to  evaluate  the  cost  of  level  1  segments  form  individ¬ 
ual  process  costs.  From  the  results,  we  infer  that  the 
hierarchical  behavioral  partitioning  is  both  a  suitable 
and  a  viable  approach  to  multicomponent  synthesis 
and  hierarchical  packaging. 
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Resource  Constrained  RTL  Partitioning  for  Synthesis  of 

Multi-FPGA  Designs 


Abstract 

In  this  paper  we  address  the  problem  of  partitioning  register  level  designs  for  implementation  on  multiple  FPGAs . 
The  pariitioner  uses  a  modified  multi-way  Fiduccia-Mattheyses  (FM)  algorithm.  Cost  estimation  functions  needed 
by  the  pariitioner  to  estimate  the  resources  needed  by  the  design  on  a  FPGA  have  been  developed .  The  methodology 
for  estimation  of  resources  on  an  FPGA  device,  and  partitioning  of  the  design  are  discussed  in  detail.  For  this 
paper ,  we  use  Xilinx  XC4000  family  of  FPGAs  as  our  target  architecture .  Within  this  family,  heterogeneous 
selection  of  FPGA  devices  can  be  used. 

1  Introduction 

A  design  that  has  to  be  implemented  on  a  Field  Programmable  Gate  Array  (FPGA)  needs  certain  resources  on 
the  FPGA  device.  The  kind  of  resources  on  the  chip  depend  on  the  target  architecture.  These  resources  include 
the  Configurable  Logic  Blocks  (CLBs)  containing  the  Function  Generators  (FG)  and  Flip-Flops  (FF)  for  Xilinx 
architecture  of  FPGAs.  If  the  design  which  has  to  be  down-loaded  onto  an  FPGA  needs  more  resources  than 
available  on  one  device,  there  is  a  need  to  partition  the  design  into  multiple  segments  such  that  each  of  the 
partition  segments  satisfies  resource  constraints  on  the  devices  available.  To  achieve  this  goal,  we  formulate  the 
Multi-FPGA  partitioning  problem  for  Register  transfer  level  (RTL)  designs  as  follows: 

Given  a  register  level  design  represented  as  a  net-list  of  components  and  constraints  in  terms  of 
maximum  number  of  available  CLBs,  function  generators,  flip  flops  and  allowable  user  I/O  pins 
on  each  chip,  partition  the  design  into  a  set  of  interconnected  design  segments,  each  of  which 
satisfies  the  constraints. 


The  partitioning  system  creates  one  or  more  bit  map  files  depending  on  the  specified  constraints.  Each  bit-map 
file  can  be  down-loaded  onto  a  Xilinx  xc4000  family  FPGA.  Input  to  the  system  is  a  register  level  design  which 
consists  of  a  data-path  and  a  controller.  The  data-path  consists  of  a  collection  of  components  selected  from 
a  known  parameterized  component  library.  This  library  has  various  components  such  as  adders,  subtractors, 
multipliers,  dividers,  latches,  multiplexers  etc.  The  controller  is  specified  as  an  algorithmic  behavioral  description 
of  a  finite  state  machine.  These  components  are  further  discussed  in  detail  in  later  sections  of  the  paper. 

2  Integration  of  tools  for  Synthesis  and  partitioning  of  FPGA  designs 

We  use  a  high  level  synthesis  system  which  takes  behavioral  descriptions  as  input  and  produces  register  transfer 
level  descriptions  of  the  same  design.  The  high  level  synthesis  system  is  called  Distributed  Synthesis  System 
(DSS),  developed  at  The  University  of  Cincinnati  [?],  for  producing  RTL  designs.  The  system  produces  register 
transfer  design  in  two  parts,  namely,  the  ’data-path’  and  the  ’controller’.  The  data-path  is  represented  as  a 
net-list  of  register  transfer  level  components.  The  controller  is  represented  as  a  finite  state  machine. 

The  input  to  the  multi-FPGA  partitioning  system  is  a  register  level  design  (output  of  DSS)  and  the  constraints  are 
the  FPGAs  available,  the  library  of  RTL  components,  and  the  resource  utilizations  of  all  register  level  components 
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Figure  1:  Multi-FPGA  synthesis  flow 

for  varying  generic  parameters.  Resource  estimator  and  the  partitioning  algorithm  form  the  central  components 
of  the  partitioning  process.  Resource  estimation  involves  accurate  estimation  of  necessary  resources  for  the  design 
and  the  partitioning  involves  the  proper  choice  of  design  segments  which  satisfy  user  specified  constraints.  The 
resources  here  refer  to  the  number  of  CLBs  (packed  CLBs),  the  number  of  function  generators,  and  the  number 
of  flip-flops.  Pin  constraints  are  also  taken  into  account  while  determining  the  partitions.  If  the  design  cannot  be 
partitioned  into  the  available  number  of  chips,  each  with  allowed  number  of  I/O  pins,  the  partitioner  returns  the 
best  possible  solution  obtained  during  the  specified  number  of  iterations  on  the  K-way  FM  partitioning  algorithm. 

The  resource  estimator  works  on  the  data-path  and  the  controller  separately  and  gives  estimates  for  the  overall 
design  using  these  individual  estimates.  Once  the  estimation  is  done,  it  is  determined  whether  the  given  design 
needs  to  be  partitioned  or  not  depending  on  the  resources  needed  by  the  design  and  the  specified  selection  of 
FPGA  devices.  The  partitioner  is  invoked  if  needed.  It  uses  a  modified  multi-way  Fiduccia-Mattheyses  algorithm 
[?],  discussed  later,  to  produce  partition  segments  which  satisfy  the  constraints.  These  partitions  are  used  as 
input  to  logic  synthesis  tools  to  generate  bit-map  files.  The  design  flow  for  obtaining  programmed  bit-map  files 
for  FPGAs  is  shown  in  Figure  1.  We  use  the  Synopsys  design  analyzer  for  logic  synthesis  of  partitioned  RTL 
designs.  This  produces  a  gate  level  net-list  of  the  design  in  terms  of  hard  macros  and  function  generators  from 
Xilinx  library.  Since  our  target  implementation  is  Xilinx  FPGA  devices,  we  use  Xilinx  XDM  tools  for  generating 
layouts  and  producing  bit-map  files  necessary  to  down-load  the  design  onto  the  FPGA. 

3  RTL  component  library 

The  data-path  part  of  the  register  level  design  contains  components  selected  from  a  RTL  component  library. 
These  components  in  turn  use  hard  macros  from  Xilinx  XC4000  family.  The  descriptions  of  these  components 
were  initially  written  at  behavioral  level  and  ideally,  the  synthesis  tools  should  be  able  to  understand  all  of  the 
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Component 

Xilinx  Hard 

Macros  used 

Const_reg 

Buf 

Adder 

Addl,  Vcc,  Inv 

Subtractor 

Addl,  Gnd,  Inv 

Comparator 

Nor2,  And2bl,And2,  Inv, 
And3bl,Xnor2 

Latch 

FDCE 

Multiplexer 

M2.1 

Shift_reg 

And2,  Or2,  Or3 

Signal 

And2,  And3,  And4,  Inv, 
FDPE,  0r2,  Xnor2,  FDCE 

And 

And2 

Or 

Or2 

Nor 

Nor2 

Xor 

Xor2 

Xnor 

Xnor2 

Not 

Inv 

Table  1:  RT  level  Components  in  component  library  instantiated  in  RT  level  code 


currently  available  target  architectures  and  synthesize  the  descriptions  for  a  particular  target  architecture.  In  this 
process,  the  synthesis  tool  might  produce  a  gate-level  design,  which  when  taken  to  layout  might  be  violating  the 
area  constraints,  or  might  be  so  computationally  intensive  that  it  takes  several  hours  to  synthesize.  To  overcome 
these  problems,  the  register  level  components  used  in  our  library  instantiate  the  Xilinx  library  components  directly 
and  thus  are  targeted  for  Xilinx  xc4000  family  of  FPGA  family.  These  components  are  parameterized  for  varying 
values  of  bit-width.  Apart  from  this,  components  like  Multiplexer  are  parameterized  for  other  generics  like  number 
of  inputs  and  number  of  select  lines.  Table  1  shows  the  components  and  the  corresponding  Xilinx  hard  macros 
used. 

Each  library  module  is  characterized  for  the  number  of  CLBs,  function  generators  and  flip-flops  for  different  values 
of  generic  parameters.  This  characterized  data  is  made  available  to  the  partitioning  tool  in  the  format  shown 
in  Table  2.  This  data  was  obtained  experimentally  by  synthesizing  several  instances  of  each  of  the  components 
with  varying  generic  parameter  values  and  generating  the  Xilinx  LCA  (Logic  Cell  Array)  files.  In  this  table, 
FF,  FG  and  CLB  denote  the  necessary  number  of  flip-flops,  function  generators  and  CLBs  respectively  for  each 
component.  Each  entry  in  this  table  is  of  the  form  (x,y)  where  5x’  is  the  bit-width  of  the  component  and  ’y’  is 
the  resource  needed.  Note  that  the  Table  2  shows  only  a  small  selection  of  the  data  for  our  library.  For  example, 
the  resource  utilizations  for  the  multiplexer  module  are  for  2  inputs  and  1  select  line. 

4  Resource  Estimator 

Estimation  functions  for  estimating  the  number  of  function  generators,  flip-flops  and  CLBs  needed  for  an  input 
design  have  been  developed.  Estimation  of  resources  needed  by  a  design  represented  as  a  data  path  and  a 
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controller  can  be  done  by  considering  each  of  these  entities  separately. 


Estimates  for  data-path  :  We  estimate  the  number  of  function  generators  and  flip-flops  needed  by  the  data 
path  and  use  this  data  in  determining  the  number  of  CLBs  needed  by  the  whole  design  when  it  is  taken  to  layout. 
To  estimate  the  number  of  function  generators  and  flip  flops,  we  add  up  these  values  for  all  the  instantiated 
components.  The  generic  values  used  to  instantiate  various  register  transfer  level  components,  and  their  respective 
resource  utilizations  are  obtained  through  table-lookup  from  the  system  database  (Table  2).  Logic  trimming  done 
at  gate  level  is  taken  into  account  by  reading  the  input  net-list,  and  determining  the  signals  not  being  used.  In 
other  words,  we  determine  the  load-less  signals,  if  any,  in  the  design.  This  does  not  happen  frequently  in  the  case 
of  synthesized  designs.  However,  for  modules  such  as  the  “signal”  module  in  Table  2,  which  contains  multiple 
flip-flops,  outputs  of  some  flip-flops  may  not  be  used.  The  flip-flops  FDPE  and  FDCE  are  the  hard  macros 
used  in  Xilinx  FPGAs  to  store  the  bits  in  the  clocked  components.  Once  the  load-less  signals  are  determined, 
the  corresponding  number  of  flip-flops  used  to  store  these  signals  is  subtracted  from  the  number  obtained  by 
summing  up  the  individual  component  flip-flop  counts  in  the  design.  This  gives  the  number  of  flip-flops  necessary 
for  the  data-path.  A  similar  procedure  is  followed  for  obtaining  an  estimate  of  function  generators  used  by  the 
data-path. 


#  No.  of  Flip-Flops  needed  by  data-path  ( FFdP)=  F F^count(r)  —  UFFjcount(s) 

where,  F F-count(r)  is  the  number  of  Flip-flops  of  individual  register  level  components,  U F F-Count(s)  is 
the  unused  flipflop  count  of  component  whose  output  signal  is  V  and  L  is  the  set  of  load-less  (unconnected) 
signals  in  the  data-path. 

♦  No.  of  Function  Generators  needed  by  the  data-path  ( FGdp)=  Ylrtdp  F’G -count  {r)-~Yl$€L  &  FG.count(s) 

where,  FG~count(r)  is  the  number  of  Function  generators  of  individual  register  level  components  in  data¬ 
path,  UFG-Count(s)  is  the  number  of  unused  function  generators  of  component  whose  output  signal  is  V 
and  L  is  the  set  of  load-less  signals  in  the  data-path. 

Since  each  CLB  in  XC4000  family  of  FPGAs  has  2  function  generators  and  2  flip-flops,  the  number  of  packed 
CLBs  needed  is  determined  to  be  half  the  number  of  flip-flops  (function  generators)  for  designs  with  dominating 
sequential  (combinational)  logic,  that  is,  dominating  number  of  flip-flops  (function  generators).  That  is, 

No.  of  Packed  CLBs  needed  by  the  data-path  =  0.5  *  Ma x(FFdp,  FGdp) 


Estimates  for  controller  :  The  necessary  number  of  function  generators  and  flip-flops  in  the  controller  part  of 
the  design  can  be  estimated  by  studying  the  description  of  the  finite  state  machine  (FSM).  The  number  of  states 
in  the  controller  is  the  main  factor  which  determines  the  amount  of  logic  required  on  the  chip.  The  number  of 
state  variables  depends  on  the  number  of  states  in  the  FSM  and  is  given  by  /o^2(uumber  of  states).  A  register 
whose  bit-width  is  equal  to  the  number  of  state  variables  is  needed  to  store  the  present  state  and  next  state 
variables. 

The  elaborated  FSM  is  represented  as  a  set  of  multiplexers  and  gates  by  the  logic  synthesis  tool.  The  size  of 
inputs  and  select  signals  to  the  multiplexers  was  found  to  be  proportional  to  the  control  word  length,  and  number 
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S.No.  Module  (Bit- width,  Resource  count) 

Name 


Multiplexer 

FF  :  (1,0), (2,0), (4,0)J(8,0), (16,0) 

FG  :  (1,1), (2, 2), (4, 4), (8, 8), (16, 32) 

CLB  :  (1,1), (2,1), (4, 2), (8, 4), (16, 16) 

Signal 

FF  :  (1,7), (2, 12), (4, 19), (8, 35), (16, 67), (32, 80) 

FG  :  (1,3), (2, 5), (4, 7), (8, 13), (16, 25), (32, 40) 

CLB  :  (1,4), (2, 6), (4, 10), (8, 17), (16, 34), (32.40) 

Comparator 

FF  :  (1,0), (2,0), (4,0), (8,0), (16,0), (32,0) 

FG  :  (1,3), (2, 8), (4, 18), (8, 37), (16, 38), (24, 60), (32, 157) 

CLB  :  (1,1), (2, 4), (4, 9), (8, 18), (16, 19), (24, 30), (32, 78) 

And 

FF  :  (1,0), (2,0), (4,0), (8,0), (16,0)  (32,0), (64,0) 

FG  :  (1,1),  (2,2),  (4,4),  (8,8),  (16,16)  (32,32),  (64,64) 

CLB  :  (1,1),  (2,1),  (4,2),  (8,4),  (16,8)  (32,16),  (64,32) 

Or 

FF  :  (1,0), (2,0), (4,0), (8,0), (16,0)  (32,0), (64,0) 

FG  :  (1,1),  (2,2),  (4,4),  (8,8),  (16,16)  (32,32),  (64,64) 

CLB  :  (1,1),  (2,1),  (4,2),  (8,4),  (16,8)  (32,16),  (64,32) 

Nor 

FF  :  (1,0), (2,0), (4,0), (8,6), (16,0)  (32,0), (64,0) 

FG  :  (1,1),  (2,2),  (4,4),  (8,8),  (16, 16), (32, 32),  (64,64) 

CLB  :  (1,1),  (2,1),  (4,2),  (8,4),  (16,8)  (32,16),  (64,32) 

Xor 

FF  :  (1,0), (2,0), (4,0), (8,0), (16,0), (32,0), (64,0) 

FG  :  (1,1),  (2,2),  (4,4),  (8,8),  (16, 16), (32, 32),  (64,64) 

CLB  :  (1,1),  (2,1),  (4,2),  (8,4),  (16, 8), (32, 16),  (64,32) 

Xnor 

FF  :  (1,0), (2,0), (4,0), (8,0), (16,0), (32,0), (64,0) 

FG  :  (1,1),  (2,2),  (4,4),  (8,8),  (16,16)  (32,32),  (64,64) 

CLB  :  (1,1),  (2,1),  (4,2),  (8,4),  (16, 8), (32, 16),  (64,32) 

Const_reg 

FF  :  (1,0),  (2,0),  (3,0),  (4,0),  (5,0),  (6,0),  (7,0),  (8,0),  (16,0) 

FG  :  (1,0),  (2,0),  (3,0),  (4,0),  (5,0),  (6,0),  (7,0),  (8,0),  (16,0) 

CLB  :  (1,0),  (2,0),  (3,0),  (4,0),  (5,0),  (6,0),  (7,0),  (8,0),  (16,0) 

Adder 

FF  :  (1,0), (2,0), (4,0), (8,0), (16,0), (32,0), (64,0) 

FG:  (1,1), (2, 3), (4, 6), (8, 12), (12, 18), (16, 24), (20, 30), (32, 48), (64, 64) 
CLB  :  (1,1), (2,1), (4, 3), (8, 6), (12, 9), (16, 12), (20, 15), (32, 24), (64, 32) 

Subtractor 

FF  :  (1,0), (2,0), (4,0), (8,0), (16,0), (32,0), (64,0) 

FG  :  (1,1), (2, 3), (4, 6), (8, 12), (12, 18), (16, 24), (32, 48), (64, 64) 

CLB  :  (1,1), (2,1), (4, 3), (8, 6), (12, 9), (16, 12), (20, 15), (32, 24), (64, 32) 

ShiftJleg 

FF  :  (1, 3), (2, 4), (3, 6), (4, 8), (5, 10), (6, 12), (7, 14), (8, 16)(16, 32), (32, 40) 
FG  :  (1,1), (2, 4), (3, 6), (4, 8), (5, 10), (6, 12), (7, 14), (8, 16)  (16, 32), (32, 40) 
CLB  :  (1,1), (2, 2), (3, 3), (4, 4), (5, 5), (6.6), (7, 7), (8, 7)  (16.16), (32, 20) 

Not 

FF  :  (1,0), (2,0),  (3,0), (4,0),  (5,0), (6,0), (7,0), (8,0),  (16,0), (32,0) 

FG  :  (1,0),  (2,0),  (3,0), (4,0),  (5,0), (6,0), (7,0), (8,0),  (16,0),  (32,0) 

CLB  :  (1,0),  (2,0),  (3,0), (4,0),  (5,0), (6,0), (7,0), (8,0),  (16,0),  (32,0) 

Table  2:  Data  provided  in  Component-data  file  (input  to  estimator) 
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of  function  generators  was  found  to  be  proportional  to  the  number  of  states  in  the  FSM.  We  conducted  a  series 
of  experiments  and  found  that  the  number  of  function  generators  needed  by  the  FSM  is  lesser  if  the  control 
word  in  the  FSM  depends  only  on  the  current  state  (Moore  machine)  rather  than  on  the  current  state  and  the 
control  inputs  (Mealy  machine).  The  number  of  gates  (  and  hence  the  number  of  function  generators)  needed 
by  the  FSM  depends  on  the  number  of  nested  ’case’  statements  in  the  FSM  description  in  VHDL.  This  implies 
that  every  time  the  input  flags  or  input  state  bits  are  checked  for  assigning  a  value  to  the  output  of  FSM,  the 
number  of  gates/function  generators  increase.  Using  a  large  number  of  designs,  the  increase  was  found  to  be  3 
function  generators  for  each  nested  ’case’  statement.  Hence  the  factor  3  in  the  equation  below.  The  length  of  the 
control  word,  which  is  the  output  of  the  FSM  does  not  significantly  affect  the  amount  of  logic  necessary  for  the 
controller.  On  the  other  hand,  number  of  states  in  the  controller  has  a  major  influence  on  the  resources  needed  on 
an  FPGA.  We  found  that  there  is  almost  an  exponential  increase  in  the  number  of  function  generators  necessary 
with  increasing  states  in  a  controller.  This  is  due  to  the  extra  logic  that  is  needed  for  assigning  values  to  the 
signals  for  each  state  that  is  included  in  the  finite  state  machine.  The  exponent  S  was  determined  to  be  2.0.  This 
was  found  by  varying  the  number  of  states  in  the  controller,  keeping  all  the  other  factors  constant  and  producing 
LCAof  the  FSM. 


•  No.  of  flip-flops  needed  by  controller  (FFC)  =  No.  of  state  variables  in  the  FSM. 

•  No.  of  function  generators  needed  by  the  controller  ( FGC )  = 

No.  of  state  variables  **  S  4- 
No.  of  bits  in  control  word  *  C  4- 
No.  of  nested  ’case’  statements  *  F 

where,  S  =  2.0,  C  =  0.3  and  F  =3. 

Since  the  number  of  function  generators  in  a  FSM  is  usually  much  larger  number  than  the  number  of 
state  variables  (number  of  flip-flops)  in  the  FSM,  the  number  of  packed  CLBs  needed  by  the  controller  is 
estimated  to  be  half  of  the  number  of  function  generators. 

•  No.  of  Packed  CLBs  needed  by  the  controller  =  0.5  *  Max(FFc,  FGC ) 


Estimates  for  the  complete  design  :  Estimates  of  resources  needed  by  data-path  and  controller  can  be  used 
in  determining  the  number  of  function  generators,  flip-flops  and  packed  CLBs  needed  by  the  complete  design. 


•  The  number  of  function  generators  in  the  complete  design  ( FGrtl )  =  ( FGdp  +FGC).  For  tighter  estimates, 
a  multiplication  factor  G  can  be  taken  into  account,  where  G  represents  the  global  optimization  factor. 

By  synthesizing  and  analyzing  a  large  number  of  designs,  we  found  that  the  global  optimization  factor  is 
found  to  lie  between  0.6  and  0.9  depending  on  the  amounts  of  combinational  and  sequential  logic  involved 
in  the  design. 

•  Number  of  flip-flops  in  the  complete  design  ( FFrtl)=  FFdp  +  FFC 

•  Packed  CLB  count  for  the  complete  design  ( CLBrtl )  =  0.5  *  M^FGrtl,  FFrtl ) 
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5  Partitioning  Algorithm  For  Producing  Multiple  FPGA  designs 

Partitioning  of  a  design  involves  determining  the  constraints  such  as  the  overall  resource  utilizations  on  an  FPGA, 
and  constructing  the  partitions  subject  to  these  constraints. 

The  partitioner  takes  the  input  RTL  net-list  and  produces  multiple  RTL  design  segments.  Each  segment  is  subject 
to  the  following  constraints: 

1.  Resource  Constraint:  The  resources  required  by  any  segment  pf  the  design  should  not  exceed  the  maximum 
allowed  values  set  by  the  user  on  the  particular  FPGA  part  number.  The  constraints  here  refer  to  number 
of  CLBs,  function  generators  and  flip-flops  present  in  the  FPGA  device  made  available  to  the  partitioning 
system.  Let  these  constraints  be  denoted  by  CLB, ,  FGS  and  FFS}  where  ’s’  denotes  an  FPGA  device 
available. 

2.  Pin  Constraint:  Because  of  the  limitation  on  the  number  of  user  I/O  pins  on  any  FPGA  chip,  we  partition 
the  input  design  into  segments  in  such  a  way  that  the  interconnect  between  the  segments  does  not  need 
more  I/O  pins  than  available.  In  other  words,  the  number  of  pins  on  any  segment  should  not  exceed  the 
allowed  number  of  user  I/O  pins  Ps  on  each  chip.  This  is  checked  for  all  the  partitions  of  the  design. 

3.  Overall  design  constraint:  Number  of  segments  after  partitioning  should  not  exceed  the  allowed  number  of 
FPGA  devices. 

We  use  the  modified  multi-way  Fiduccia-Mattheyses  algorithm  [?]  for  partitioning  an  input  design  into  multiple 
design  segments.  The  multi-way  FM  partitioning  algorithm  used  is  shown  in  Figure  2.  The  partitioner  begins 
by  reading  the  package  library.  This  package  has  the  information  about  the  FPGA  devices  available  in  the 
format  shown  in  Table  6.  The  number  of  partitions  is  initialized  to  1,  and  the  FM  partitioner  is  invoked  with  the 
number  of  partitions  and  the  package  library.  The  FM  partitioner  in  turn  invokes  the  K-Way  Fiduccia-Mattheyses 
algorithm  which  works  on  the  input  design  (in  the  form  of  a  Graph  ’G’),  the  number  of  required  partitions  and 
the  FPGA  package.  It  returns  a  Result,  which  is  a  flag  to  indicate  whether  or  not  all  the  partition  segments  are 
assigned  a  device.  In  the  event  when  the  Result  is  false  (that  is,  not  all  partition  segments  have  a  fitting  chip 
amongst  the  devices  made  available  by  the  user),  the  partitioner  increments  the  number  of  required  partitions 
and  repeats  the  above  process  while  this  number  does  not  exceed  the  total  number  of  available  devices,  or  till  a 
successful  mapping  of  partition  segments  to  FPGA  devices  is  found. 

The  K-Way  partitioner  initializes  the  values  of  total  number  of  CLBs,  function  generators  and  flip-flops  for  the 
whole  design,  which  are  obtained  as  output  of  the  estimation  functions.  It  then  determines  the  minimum  number 
of  FPGAs  needed  by  the  design.  This  is  calculated  as, 

Minimum  number  of  partition  segments  =  \Packed  CLBs  for  complete  design  /  MaxCLB ] 
where,  MaxCLB  =  Number  of  CLBs  available  on  the  largest  FPGA  device  available. 

Once  the  number  of  chips  is  determined,  a  random  initial  partition  of  N  partition  segments  is  created  by  the 
K-Way  FM  algorithm,  where  N  is  the  minimum  number  of  chips.  As  a  result,  the  graph  G  of  V  vertices  is 
partitioned  into  N  segments,  each  with  a  fixed  number  of  nodes.  The  initial  partition  is  saved  as  Best  partition. 
The  pins  on  all  partitions  are  calculated  by  compute-pins-on-alLpartitionsQ  and  the  value  saved  as  best-pins .  K- 
Way  partitioning  is  carried  out  by  repeatedly  invoking  the  standard  FM  bi-partitioning  algorithm  [?]  on  pairs  of 
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MultLway  JTM-Partitioner  ( ) 
begin 

Package-ptr  Read_package_data() 

Num.of.partitions  =1 

While  (Num_oLpartitions  <=  Available_num_of_chips) 

Result  'FM(Num-oLpartitions,  Package.ptr) 
if  (Result  =1)  then 
return  partitions 
else 

Num-oLpartitions  «—  Num_of_partitions  +  1 

end  while 
if  (Result  =  0)  then 

return  Best  possible  partition  and  prompt  user  for  bigger  FPGA  devices 
end  if 

end 


int  FM(Num_of_partitions,  Package_ptr) 
begin 

Initialize_.private.data() 

K-way(G,  Num-oLpartitions,  Package.ptr) 
Result  +—  Check_assigned_chips() 
return  Result 
end 


Figure  2:  Algorithm  for  Multi-Way  FM  partitioning 

partition  segments,  two-way  Jm()  tries  to  improve  bi-partitions  by  moving  one  node  at  a  time  from  one  partition 
to  the  other.  Each  time  a  move  is  made,  ckeck-chip-constraznt(S)  is  invoked  to  ensure  that  each  partition  segment 
satisfies  the  constraints.  This  function  checks  if  the  partition  segment  S  satisfies  the  constraints  such  as  CLBs, 
function  generators,  flip-flops  and  pins  available  on  the  chip  and  returns  the  status  to  K-Way  FM.  The  status  is 
true  if  the  constraints  are  met  by  the  partition  segment  and  false  otherwise.  Once  a  chip  is  found  in  which  the 
partition  segment  fits  in,  the  device  part  number  is  assigned  to  this  partition  segment. 

The  K-WAY  FM  algorithm  is  invoked  repeatedly  until  either  (1)  a  solution  that  satisfies  the  specified  constraints 
is  found,  or  (2)  a  user  specified  limit  on  number  of  iterations  (MAX.ITER.CNT)  is  exceeded.  The  partitioner 
returns  either  a  set  of  partition  segments  that  satisfy  the  constraints  or  the  best  possible  solution  (if  the  constraints 
could  not  be  met  by  all  the  partitions). 
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K-WAY(G,  Num_of_partitions,  package>ptr) 
begin 

Best  initialize() 

Min-chips  calculate-jnin_chips() 
if  Min-chips  >  Num_oLpartitions  then 
N  Min_chips 
else  N  <—  NumjoLpartitions 
Create-initiaLpartition() 
Compute-pins-on.alLpartitions() 
best-pins  Pins(Best) 

S  <—  null 

continue-part  <—  TRUE 
iteration-cnt  1 
improvement  «—  1 

while  continue_part  =  TRUE  do 
for  i  =  1  to  n— 1  do 
for  j  =  i-bl  to  n  do 
two-way. fm(si ,  Sj) 
end  for 
end  for 

curr.pins  Pins(S) 
iteration-ent  <—  iteration-ent  +  1 
status  4—  checkmChip.constraint(S) 
if  curr-pins  <  best-pins  V  status  =  TRUE  then 
improve_cnt  <—  1 
Best  S 
else 

improvement  4—  improvement  +  1 
end  if 

if  iteratioment  =  MAX-ITER-CNT  V 
improve-ent  =  MAXJMP 
then  cont-part  4—  FALSE 
end  if 
end  while 

return(Best)  /*  retrieve  best  partition  */ 

end 


Figure  3:  Algorithm  for  partitioning  (Contd.) 
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Check-chip -constraint  ( S ) 
begin 

status  TRUE 

for  all  Si  G  S  do  /*  segments  in  partition  */ 

if  CLBs(si)  >  CLBS  V pins(si)  >  Ps  V 
FG(«i)  >  FGS  V  FF(si)  >  FFS 
then  status  FALSE 
end  if 
end  for 
return(status) 
end 


Figure  4:  Algorithm  for  partitioning  (Contd.) 

6  Implementation  and  results 

We  have  developed  a  vertically  integrated  system  for  a  top-down  design  flow  for  FPGA  synthesis  with  the  high  level 
synthesis  system,  DSS  as  the  front  end  and  multi-way  Fiduccia-Mattheyses  algorithm  being  used  for  producing 
partitions  of  the  input  design.  We  used  Xilinx  XC4000  as  our  target  FPGA  family  and  used  Synopsys  design 
analyzer  and  Xilinx  XAGT  design  manager  tools,  to  produce  gate-level  net-list  and  programmed  bit  map  files 
respectively  for  FPGA  implementation.  We  developed  estimation  procedures  for  estimating  the  resources  needed 
by  a  register  level  design  to  be  implemented  on  the  FPGA  devices.  We  used  these  estimates  and  produced  multiple 
register  level  designs  using  the  Multi-Way  FM  partitioning  algorithm. 

Tables  3,  4  and  5  show  actual  and  estimated  resources  needed  by  the  data-path,  controller  and  the  complete 
design  respectively.  It  can  be  observed  from  these  tables  that  the  estimated  and  actual  values  of  flip-flops  match 
exactly  for  the  data-path  and  controller  since  the  correct  utilizations  for  the  RTL  components  is  provided  to  the 
estimator  (in  the  case  of  data-path)  and  the  number  of  flip-flops  can  be  correctly  known  from  the  number  of  states 
in  the  FSM  (in  the  case  of  controller)  .  Since  the  number  of  flip-flops  for  the  overall  design  are  calculated  from 
those  needed  by  the  data-path  and  controller,  the  estimated  values  of  flip-flops  needed  by  the  overall  design  match 
exactly  with  the  actual  values.  On  the  other  hand,  we  find  that  estimated  and  actual  values  of  function  generators 
in  the  data-path,  controller  and  overall  design  differ  on  an  average  by  about  6%  in  the  case  of  data-path,  11%  in 
the  case  of  controller  and  9%  for  the  complete  design.  The  number  of  packed  CLBs  for  the  complete  design  differ 
from  the  estimates  due  to  the  discrepancies  in  the  estimates  of  function  generators,  which  is  in  turn  due  to  the 
FSM  synthesis  methodology  used  by  the  logic  synthesis  tool  and  global  optimization  over  function  generators  . 
This  deviation  from  the  actual  values  was  found  to  be  about  10%  on  an  average. 

Table  6  shows  a  sample  FPGA  device  selection  provided  by  the  user.  This  has  information  such  as  the  FPGA 
part  number,  number  of  chips  of  each  kind  available,  and  the  resources  available  on  each  chip.  Constraints  and 
corresponding  partitions  obtained  from  the  partitioner  for  a  number  of  designs  are  shown  in  Table  7.  The  design 
utilizations  in  this  table  refer  to  the  estimated  values  of  resources  needed  by  each  of  the  designs.  For  the  first 
example,  the  design  fits  in  only  one  device  and  hence  the  mapping  is  as  shown.  In  the  case  of  ’DCT  example, 
suitable  partitions  for  the  devices  available  cannot  be  found.  Hence  there  are  no  partitions  available.  Instead 
the  partitining  system  prompts  the  user  to  try  with  bigger  chips.  The  execution  time  of  the  partitioning  tool  for 
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these  examples  lies  between  1.2  sec  to  4.0  sec. 

When  a  design  is  partitioned  into  multiple  design  units,  the  delay  on  the  nets  passing  from  one  unit  to  the  other 
might  be  so  large  that  the  the  frequency  of  operation  of  the  overall  design  is  drastically  reduced.  We  are  currently 
extending  our  partitioning  engine  to  incorporate  delay  constraints. 


Design 

No.  of  RTL  comp’s 
in  the  data-path 

Function  generators 

Flip-flops 

Estimate 

Actual 

Estimate 

Actual 

TLC 

33 

47 

44 

48 

48 

SS-prod 

34 

423 

374 

369 

369 

DCT 

23 

157 

187 

209 

209 

Find 

57 

350 

384 

184 

184 

Table  3:  Estimated  and  actual  values  for  data-path 


m 

Num  of 

states 

Control  word 
length 

Function  generators 

Flip-flops 

Estimate 

Actual 

Estimate 

Actual 

40 

109 

86 

6 

6 

SS-prod 

37 

40 

143 

132 

6 

6 

DCT 

38 

30 

129 

99 

6 

6 

Find 

76 

70 

182 

199 

7 

7 

Table  4:  Estimated  and  actual  values  for  controller 


Design 

Function  generators 

Flip-flops 

Packed  CLBs 

Estimate 

Actual 

Estimate 

Actual 

Estimate 

Actual 

TLC 

140 

156 

54 

54 

70 

78 

SS-prod 

510 

411 

375 

375 

255 

205 

DCT 

258 

312 

215 

215 

129 

156 

Find 

479 

520 

191 

191 

239 

260 

Table  5:  Estimated  and  actual  values  for  complete  design 
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FPGA  Part 

CLBs 

Function  Generators 

Flip-Flops 

I/O  pins 

Num  available 

XC4002 

64 

128 

128 

64 

i 

XC4003 

100 

200 

200 

80 

2 

XC4005 

196 

392 

392 

112 

1 

XC4010 

400 

800 

800 

160 

0 

Table  6:  Sample  FPGA  Device  Selection. 

Note  :  ’Number  available’  is  specified  by  the  user.  The  rest  of  the  data  is  provided  by  a  configuration  file  to  the 

partitioning  tool. 
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Design 

Design 

FPGA  package  data 

Partitions 

utilization 

Part  (No.  available) 

Chipl 

Chip2 

Chip3 

Chip4 

TLC 

CLBs=70(78), 

XC4002  (2) 

XC4003 

- 

- 

- 

FGs=140(156), 

XC4003  (1) 

FGs= 140(156) 

- 

- 

- 

FFs=54(54) 

FFs=54(54) 

- 

- 

- 

TLC 

CLBs=70(78), 

XC4003  (2) 

XC4003 

- 

- 

- 

FGs=140(156), 

XC4004  (1)  ' 

FGs=140(156) 

- 

- 

- 

FFs=54(54) 

FFs=54(54) 

- 

- 

SS-Prod 

CLBs=255(205), 

XC4004  (1) 

XC4004 

XC4005 

- 

- 

FGs=510(411), 

XC4005  (1) 

FGs=204(152) 

FGs=306(259) 

- 

- 

FFs=375(375) 

FFs=168(168) 

FFs=207(207) 

- 

- 

SS-Prod 

CLBs=255(205), 

XC4005  (2) 

XC4005 

XC4005 

- 

- 

FGs=510(411), 

FGs=204(152) 

FGs=306(259) 

- 

- 

FFs=375(375) 

FFs=168(168) 

FFs=207(207) 

- 

- 

DCT 

CLBs=129(156), 

XC4003  (2) 

- 

I 

- 

- 

FGs=258(312), 

- 

- 

- 

FFs=215(215) 

- 

- 

- 

DCT 

CLBs=129(156), 

XC4004  (1) 

XC4004 

- 

- 

- 

FGs=258(312), 

XC4005  (2) 

FGs=258(312) 

- 

- 

- 

FFs=2 15(215) 

FFs=215(215) 

- 

- 

- 

Find 

CLBs=239(260) 

XC4008  (1) 

XC4008 

- 

- 

- 

FGs=479(520) 

FGs=479(520) 

- 

- 

- 

FFs=191(191) 

FFs=191(191) 

- 

- 

- 

Find 

CLBs=239(260) 

XC4004  (2) 

XC4004 

XC4004 

XC4005 

XC4005 

FGs=479(520) 

XC4005  (2) 

FGs=50(57) 

FGs=178(203) 

FGs=168(173) 

FGs=83(87) 

FFs=191(191) 

FFs=51(51) 

FFs=23(23) 

FFs=84(84) 

FFs=33(33) 

Table  7:  Constraints  and  results  of  Partitioner 
Note  :  The  resources  are  represented  as  estimated  value  (actual  value) . 
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Abstract 

Synthesis  of  pragmatic  systems  from  high-level  specifications  requires  representation  and 
application  of  both  functional  requirements  and  constraints.  This  work  presents  a  language 
for  representing  requirements  and  constraints  in  vhdl  design  representations  and  a  prototype 
case-based  synthesis  system.  VSPEC  is  an  annotation  language  for  vhdl  developed  to  support 
axiomatic  representation  of  requirements  for  system  synthesis.  VSPEC  descriptions  serve  as  syn¬ 
thesis  goals  and  verification  criteria.  A  prototype  case-based  synthesis  system  is  also  presented 
that  uses  VSPEC  requirements  as  goal  statements  and  descriptions  of  potential  solutions.  This 
prototype  system  demonstrates  how  synthesis  can  be  performed  at  the  systems  level  and  how 
constraints  can  be  used  to  implement  a  simple  concurrent  engineering  process. 
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1  Introduction 

VSPEC  is  motivated  by  the  need  to  specify  digital  system  requirements  in  an  implementation  in¬ 
dependent  fashion.  Qualitatively,  system  requirements  specify  “what”  a  system  should  achieve 
without  specifying  “how”  it  should  be  done.  Design  specifications  are  developed  from  requirements 
and  describe  “how”  requirements  are  implemented.  Although  vhdl  [14]  supports  specification  of 
specific  designs,  it  does  little  to  support  requirements  specification.  In  addition,  vhdl  does  not 
support  a  consistent  representation  of  constraints.  Thus,  requirements  specification  in  VHDL  and 
systems  level  synthesis  from  vhdl  specifications  are  not  practical  activities. 

Lack  of  requirement  and  constraint  specification  has  little  effect  when  designing  systems  re¬ 
quiring  few  levels  of  abstraction.  Excellent  vhdl  synthesis  tools  exist  at  the  RTL  level.  However, 
there  is  a  growing  need  for  systematic  design  of  very  large,  abstractly  defined  systems.  With¬ 
out  constraint  information  and  precise  requirements  definition,  effective  systems  engineering  and 
concurrent  engineering  are  impossible,  and  automated  synthesis  is  even  more  difficult  than  this. 

With  requirements  and  constraints  specified,  some  degree  of  systems  level  design  synthesis 
is  possible.  The  case-based  synthesis  system  presented  here  demonstrates  how  constraints  can  be 
integrated  into  an  automated  design  process.  System  synthesis  occurs  in  a  typical,  function  oriented 
manner.  However,  constraints  help  rank  potential  solutions  during  case  retrieval  and  are  verified 
at  each  level  of  abstraction  as  the  design  progresses. 

This  work  concentrates  on  two  subjects:  the  VSPEC  language  and  a  prototype  synthesis  sys¬ 
tem.  First,  the  vspec  interface  language  is  presented.  The  general  syntax  and  notations  used  by 
vhdl  and  vspec  are  discussed  followed  by  an  example  specification.  Specific  attention  is  given 
to  describing  how  vspec  represents  both  functional  and  non-functional  system  requirements.  Also 
discussed  is  the  relationship  between  algebraic  specification  and  vspec  with  a  presentation  of  syn- 
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thesis  goal  derivation.  Second,  a  case-based  reasoning  synthesis  process  is  presented.  The  basics  of 
the  technique  are  explained  via  an  annotated  example  which  highlights  the  role  of  constraints  in 
the  synthesis  process.  To  conclude,  perceived  limitations  of  the  synthesis  system  and  our  current 
research  directions  are  described. 

2  Design  and  Requirements  Specification 

Three  basic  constructs  are  used  to  specify  a  design  in  vhdl:  (1)  the  entity  specifies  the  interface 
of  a  system;  (2)  the  architecture  specifies  the  behavior  and/or  structure  of  a  system;  and  (3) 
the  configuration  associates  a  specific  architecture  with  an  entity.  The  designer  specifies 
a  device  interface  using  the  entity  construct,  develops  one  or  more  behavioral  or  structural  de¬ 
scriptions  using  the  architecture  and  selects  a  specific  implementation  for  the  entity  using  the 
configuration  construct. 

Each  architecture  represents  a  potential  design  at  some  level  of  abstraction.  Behavioral  spec¬ 
ifications  describe  the  behavior  of  a  solution  using  an  Ada-like  programming  language.  Structural 
specifications  indicate  how  components  are  composed  to  construct  a  solution.  In  both  cases,  specific 
candidate  designs  are  represented. 

Representation  of  system  requirements  in  vhdl  is  restricted  to  an  operational  style  -  a  “pro¬ 
gram  is  written  that  describes  an  artifact  having  desired  characteristics.  Although  the  operational 
style  is  an  excellent  means  for  describing  specific  designs,  it  is  not  well-suited  for  describing  system 
requirements  for  several  reasons: 

1.  It  forces  representation  of  a  specific  design,  thus  introducing  implementational  bias. 

2.  It  does  not  adapt  easily  to  representation  of  performance  constraints. 

3.  Unimportant  characteristics  are  indistinguishable  from  required  characteristics  of  the  design. 
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4.  Users  must  deal  with  unnecessary  detail. 

2.1  vspec  Requirements  Specification 

Figure  la  is  an  example  vhdl  entity  representing  a  component  that  searches  a  collection  of  records 
for  a  specific  record.  Note  there  is  no  indication  of  what  the  component  must  accomplish  or  what 
performance  constraints  exist  for  it.  The  result  is  a  black-box  view  of  the  component  with  no 
indication  of  requirements,  as  shown  in  Figure  lb.  An  architecture  can  be  developed,  but  such  an 
architecture  exhibits  the  negative  characteristics  described  above. 

A  solution  to  requirements  representation  in  VHDL  is  VSPEC,  a  Larch  interface  language  [8] 
developed  for  vhdl  synthesis.  The  Larch  family  of  specification  languages  consists  of  a  collection 
of  application  specific  interface  languages  and  a  common  shared  language.  Each  interface  language 
defines  sets  of  specification  primitives  containing  useful  constructs  in  a  target  application  language. 
The  shared  language  serves  two  purposes.  First,  it  provides  a  target  formal  system  for  translating 
interface  specifications.  Second,  it  provides  a  language  for  writing  auxiliary  specifications  and 
handbooks  of  common  components. 

The  traditional  shared  language  is  a  first  order  algebraic  language  call  lsl.  In  vspec,  the 
primary  shared  language  is  REFINE  [1],  due  to  its  support  for  transformation  and  synthesis,  its 
formal  basis,  and  its  potential  for  execution. 

Figure  2a  shows  the  vspec  representation  for  the  same  search  as  the  VHDL  entity  in  Figure  la. 
The  added  clauses  specify  input  conditions,  output  conditions  and  constraints.  Figure  2b  shows  a 
graphical  representation  of  the  same  information.  The  VSPEC  definition  indicates  that  Vcc  must  be 
less  that  or  equal  to  5  and  that  the  area  (x  x  y)  must  be  less  than  0.3.  No  constraints  are  placed 
on  heat  dissipation  (H),  clock  speed  (Clk)  or  timing. 

The  specification  associated  with  Figure  2  avoids  many  of  the  problems  with  the  operational 


5 


190 


specification  style.  A  search  routine  is  specified  independently  of  any  implementation  by  the 
ensures  clause.  The  designer  need  not  be  concerned  with  the  details  of  the  search  algorithm 
at  the  requirements  level.  Only  characteristics  necessary  for  specifying  a  search  are  included.  Con¬ 
straints  are  clearly  specified  in  the  constrained  by  clause  and  do  not  interfere  with  the  functional 
specification. 

2.2  The  VSPEC  entity 

All  vspec  annotations  affect  only  the  vhdl  entity.  No  changes  are  made  to  architecture 
structures  or  any  other  vhdl  structure,  vspec  clauses  are  grouped  into  four  broad  classes:  (1) 
those  that  define  a  devices  function;  (2)  those  that  define  internal  state  variables;  (3)  those  that 
define  constraints;  and  (4)  those  that  relate  vhdl  data  structures  to  formal  representations. 

2.2.1  VSPEC  Clauses  and  Logic 

The  general  form  of  a  VSPEC  clause  is  a  keyword  followed  by  a  logical  sentence.  The  keywords 
indicate  what  requirement  the  logical  sentence  specifies.  Each  logical  sentence  is  written  in  typed 
first-order  predicate  calculus  with  extensions  to  the  logic  that  allow  the  use  of  sets  and  sequences  in 
specifications.  The  logic  follows  the  basic  syntax  of  Refine,  the  language  used  for  system  synthesis, 
to  support  easy  translation  and  some  degree  of  execution. 

There  are  six  basic  vspec  clauses: 

-  requires  -  specifies  sufficient  conditions  on  inputs  and  state  for  entity  execution 

-  ensures  -  specifies  necessary  conditions  on  outputs  and  state  following  entity  execution 

-  constrained  by  -  specifies  non-functional  performance  constraints 

-  modifies  -  specifies  what  the  entity  may  alter 

-  based  on  -  associates  vhdl  data  types  with  Refine  definitions 
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-  state  -  defines  a  collections  of  variables  that  represent  the  entity’s  internal  state 

VSPEC  clauses  may  only  access  variables  and  signals  defined  in  an  entity  port,  the  state 
clause  or  quantified  in  a  logical  expression,  vspec  is  strongly  typed  and  all  variables  must  have  an 
associated  type,  including  those  bound  by  quantifiers.  Although  Repine  allows  type  inferencing, 
vspec  does  not. 

2.2.2  Functional  Requirements 

The  functional  requirements  of  a  VSPEC  entity  are  defined  using  the  requires  and  ensures 
clauses.  The  requires  clause  specifies  a  logical  expression,  I(x),  that  must  be  true  for  the  entity 
to  perform  its  operation.  The  ensures  clause  specifies  necessary  state  conditions,  0(x,  z ),  resulting 
from  entity  execution  given  a  particular  input.  Formally,  any  architecture  implementing  an 
entity  must  obey  the  condition: 


Vx  :  D  •  I(x)  =>  0(x,F(x))  (1) 

where  D  is  the  domain  of  the  transform  F(x)  is  the  transformation  performed  by  the  architecture. 
The  requires  Clause 

The  requires  clause,  I(x),  is  a  logical  expression  defined  over  all  ports,  signals  and  variables  that 
may  provide  input  to  the  transform.  I(x)  is  true  when  x  is  a  valid  input.  I(x)  is  a  precondition 
for  entity  execution.  When  it  is  true,  the  entity  must  produce  valid  output. 

The  ensures  Clause 

The  ensures  clause,  0(x,z),  is  a  logical  expression  defined  over  all  ports,  signals  and  variables. 
0(x,z )  is  true  when  z  is  a  valid  output  given  x  as  input.  0(x,z)  is  a  postcondition  for  entity 
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execution  and  states  necessary  conditions  placed  on  entity  outputs  and  state  variables. 

2.2.3  Constraints 

Constraints  express  characteristics  an  entity  must  exhibit  that  are  not  a  part  of  its  function.  For 
example,  heat  dissipation  constraints  frequently  affect  selection  of  valid  designs,  but  heat  is  a  side 
effect  of  the  technology.  It  has  little  to  do  with  the  input  and  output  relationships  specified  in  the 
requires  and  ensures  clauses. 

Although  constraints  do  not  affect  function,  they  are  critical  in  hardware  system  design.  In 
VSPEC  there  are  two  sources  of  constraint.  The  first  is  the  constrained  by  clause  that  specifies 
several  performance  constraints  common  in  hardware  design.  The  second  is  the  modifies  clause 
that  limits  what  the  entity  can  alter  in  performing  its  function. 

The  constrained  by  Clause 

The  constrained  by  clause  is  a  conjunction  of  predefined  variables  and  relations  with  fixed  values. 
vspec  currently  supports  providing  constraint  information  for  heat  dissipation,  area,  clock  speed, 
power  consumption  and  pin-to-pin  timing.  To  specify  constraint,  one  chooses  a  constraint  type 
and  uses  it  in  a  relation.  For  example,  to  specify  heat  dissipation  less  than  1  watt  and  power 
consumption  less  than  10  watts,  the  logical  sentence  heat  =<  1  and  power  =<  10  is  included  in 
the  constrained  by  clause. 

Timing  requires  a  somewhat  more  complicated  representation.  Here  one  specifies  an  interval 
between  two  pins,  then  relates  that  interval  to  a  constant  time.  For  example,  (a<->b)  =<  10 
specifies  that  the  time  between  a  signal  arriving  at  port  a  and  port  b  producing  a  signal  must  be 
less  than  10. 
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The  modifies  Clause 

The  modifies  clause  specifies  a  collection  of  ports,  signals  and  variables  that  may  be  modified  by 
the  entity.  The  modifies  clause  indicates  what  effects  and  side  effects  are  allowed.  Only  outputs 
may  be  specified  in  a  modifies  clause.  Of  particular  interest  is  the  ability  to  specify  the  direction 
of  buffer  type  ports. 

2.2.4  Abstract  Data  Types 

The  semantics  of  vhdl  data  types  must  be  defined  before  reasoning  about  their  properties  is 
possible.  Elemental  data  types  such  as  integer  and  bit  have  definitions  loaded  as  a  part  of  the 
VSPEC  system.  Thus,  when  using  a  basic  VHDL  type,  the  semantics  of  that  type  are  present  by 
default. 

The  based  on  Clause 

User  defined  data  types  such  as  arrays  and  records  must  be  defined  as  a  part  of  the  definition  process 
because  they  cannot  be  defined  a  priori.  This  is  accomplished  using  the  based  on  predicate.  The 
logical  expression  defined  in  a  based  on  clause  defines  the  semantics  of  a  user  defined  type.  To 
support  this  specification  process,  vspec  includes  standard  schemas  for  defining  sets,  sequences, 
arrays  and  tuples.  These  schemas  axe  used  in  conjunction  with  parameter  morphism  to  define 
associated  VHDL  types  specific  to  user  needs. 

2.2.5  System  State 

The  notion  of  system  state  is  typically  not  supported  directly  by  axiomatic  specification  techniques. 
A  computation  unit  is  defined  by  a  transform  that  relates  inputs  to  outputs.  Thus,  to  include  state 
in  a  specification  it  must  be  specified  as  both  an  input  and  an  output  of  the  transform.  However, 
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specification  of  state-based  systems  is  natural  to  hardware  designers  and  suggesting  that  state 
representation  be  an  input  to  the  VHDL  entity  is  not  natural.  Using  the  two-tiered  specification 
approach,  state  can  be  managed  by:  (a)  supporting  the  definition  of  local  state  variables;  and  (b) 
using  state  maintaining  features  of  port  signals.  Instead  of  specifying  a  function  that  maps  input 
signals  defined  in  the  port  definition  to  outputs  in  the  same  port  definition,  specify  a  function  that 
maps  inputs  and  state  maintaining  objects  to  outputs  and  state  maintaining  objects. 

The  state  clause 

The  state  clause  is  a  collection  of  variables  that  store  state  within  a  VSPEC  entity.  Like  VHDL 
variables  and  signals,  these  variables  maintain  their  values  from  one  invocation  of  the  entity  to 
the  next.  All  state  variables  are  defined  locally  and  are  not  visible  outside  the  entity. 

Ports 

Variables  defined  an  entity’s  port  definition  may  maintain  their  state.  Variables  of  type  buffer 
may  be  inputs  or  outputs  and  are  not  re-initialized  unless  a  signal  of  some  type  is  driving  them. 
Variables  of  type  out  and  inout  also  maintain  their  state. 

2.3  Generic  Architectures  in  vspec 

VSPEC  supports  representation  of  high  level,  abstract  architectures  using  the  architecture  con¬ 
struct  from  VHDL.  No  modifications  or  annotations  axe  necessary  -  simply  specify  entity  structures 
accessed  by  the  architecture  using  VSPEC. 

Figure  3  represents  a  two  component  architecture  for  solving  the  element  search  problem.  The 
search  entity  is  identical  to  the  one  in  Figure  2a  which  serves  as  the  starting  point  for  designing 
the  system.  The  next  step  is  creating  a  vhdl  architecture  that  solves  the  problem  specified  by 
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the  VSPEC  entity.  In  this  example,  architecture  structure  solves  this  problem  by  breaking  it  up 
into  two  sub-components:  one  which  sorts  the  input  and  one  which  retrieves  the  proper  element 
from  the  sorted  list.  This  architecture  was  generated  using  the  synthesis  technique  discussed  in 
Section  3.  The  result  of  breaking  the  problem  into  two  sub-components  is  two  new  vspec  entities 
that  describe  the  subcomponents.  Notice  that  the  combination  of  the  functional  and  performance 
constraints  of  each  sub-component  meet  the  constraints  specified  by  the  search  entity.  The  next 
step  in  the  design  process  is  to  generate  VHDL  architectures  for  each  of  these  sub-components.  The 
behavior  architecture  is  an  example  of  a  solution  for  the  bin_search  entity. 

2.4  VSPEC  and  Algebraic  Specification 

Any  vspec  definition  can  be  transformed  into  a  formal  definition.  The  form  of  this  definition  is 
an  algebraic  specification  based  on  an  extension  of  domain  theories  as  defined  in  CYPRESS  [18]  and 
kids  [20,  19].  The  basic  form  of  a  domain  theory  is  a  tuple  consisting  of  the  function  domain  ( D ), 
range  (R),  input  precondition  (I(x  :  D ))  and  output  postcondition  (0(x  :  D,z  :  R))  commonly 
referred  to  as  a  DRIO  model.  The  DRIO  model  for  any  vspec  entity  can  be  constructed  using 
the  following  rules: 

D  —  di  x  d,2  x  . . .  x  dn  where  d*  is  the  sort  (defined  by  the  based  on  clause)  representing  the  type 
associated  with  an  in,  inout,  or  buffer  ports,  or  a  state  variable 

R  =  n  x  r2  x  . . .  x  rm  where  rj  is  the  sort  representing  the  type  associated  with  an  out,  inout, 
or  buffer  port  listed  in  the  modifies  clause,  or  a  state  variable 

I(x  :  D)  =  /„( x  :  D )  where  Iv(x  :  D)  is  the  logical  sentence  defined  by  the  requires  clause 

0(x  :  D,  z  :  R)  =  Ov(x  :  D,  z  :  R)  where  Ov(x  :  D,z  :  R)  is  the  logical  sentence  defined  by  the 
ensures  clause 
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Additionally,  constraints  must  be  defined  as  a  part  of  the  algebraic  statement.  The  simplest 
means  of  accomplishing  this  is  to  include  predicates  representing  constraints  in  the  output  function 
of  the  DRIO.  However,  constraints  are  not  functional.  Specifying  constraints  in  their  own  clause  is 
an  attempt  to  separate  constraint  from  function.  Additionally,  constraints  in  their  current  form  do 
not  depend  on  variables  defined  in  the  entity1.  Thus,  constraints  are  added  to  the  DRIO  model 
through  a  specification  extension  that  adds  logical  representations  of  constraints.  Effectively,  the 
DRIO  model  becomes  a  DRIOC  model. 

C(ci  :  Ci , . . . ,  Cn  :  Cn)  =  Cv(c i  :  Ci, ... ,cn  :  Cn)  where  c*  is  a  constraint  variable  such  as  heat  or 
area,  C*  is  a  sort  associated  with  a  constraint  variable  and  Cv  is  the  logical  expression  defined 
in  the  constrained  by  clause 

With  addition  of  constraints,  the  goal  of  the  design  activity  becomes  finding  an  architecture 
that  performs  the  transform  F  :  D  R  such  that: 

'ix  :  D*I(x)^0(x,F(x))  AC{c1,...,cn)  (2) 

Thus,  the  goal  of  the  synthesis  activity  is  generation  of  a  transform  mapping  the  current  state 
and  inputs  into  the  next  state  and  outputs  such  that  the  output  condition  and  constraints  are 
satisfied. 

3  System  Synthesis 

The  case-based  reasoning  model  used  by  the  synthesis  system  is  based  on  the  standard  approach  of 
retrieval,  adaptation,  and  evaluation.  In  the  following  sections,  each  of  these  activities  is  described. 

A  more  complex  constraint  model  could  certainly  include  variables  and  signals.  Our  current  constraint  model 
does  not  allow  this. 
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The  similarity  metric,  features  and  feature  types  are  described  followed  by  a  description  of  the 
three  stage  retrieval  process.  Adaptation  via  rule  application  and  by  replacing  case  components  is 
described  next  followed  by  a  brief  description  of  the  evaluation  process.2. 

Given  a  vspec  specification  and  its  DRIOC  model  equivalent,  planning  techniques  apply  to 
system  synthesis.  The  general  goal  of  planning  is  to  accumulate  a  partially  ordered  bag  of  actions 
that  achieve  the  end  result.  This  goal  is  analogous  to  the  design  of  general  systems  where  one  is 
searching  for  a  collection  of  interconnected  devices  for  solving  a  problem.  Effectively,  I  and  O  define 
pre-  and  post-conditions  for  a  component.  In  planning  terms,  this  is  identical  to  the  description  of  a 
goal  or  plan  action.  Consider  the  goal  of  system  synthesis  described  in  Equation  1.  This  is  exactly 
the  goal  of  a  planning  system  -  given  a  pre-condition,  find  a  sequence  of  actions  that  necessarily 
imply  a  desired  post-condition. 

Although  any  number  of  planning  techniques  apply,  case-based  planning  is  discussed  here.  A 
method  derived  from  the  ASP-II[4]  analysis  planner  and  refined  in  the  BENTON[2,  3]  is  applied. 
The  ASP-II  planner  used  case- based  reasoning  to  synthesize  simulation  actions  given  characteristics 
described  in  a  before  clause  (pre-condition)  and  an  after  clause  (post-condition).  ASP-II  supports 
the  replacement  of  failed  actions  in  a  plan  using  a  technique  called  adaptation  by  re-planning. 

Adaptation  by  re-planning  works  by  inferring  a  goal  from  the  state  change  caused  by  a  plan 
action.  The  system  state  is  known  before  the  action  is  executed  from  the  post-condition  of  the 
preceding  action  and  the  system  state  after  execution  from  the  pre-condition  of  the  following  action. 
Thus,  if  an  action  or  sequence  of  actions  failed  the  goal  of  the  action  could  be  retrieved  and  used 
as  a  goal  for  a  new  planning  process. 

In  our  case-based  synthesis  system,  vhdl  components  are  analogous  to  plan  actions  and  VHDL 

2For  a  more  formal  description  of  the  retrieval  and  adaptation  processes,  please  refer  to  the  BENTON  case-based 
reasoning  component  [3] 
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architecture  structures  are  analogous  to  composite  plans.  The  DRIOC  form  of  VSPEC  require¬ 
ments  expresses  exactly  what  a  plan  action  does  -  I(X)  expresses  a  precondition  and  0(x,  z) 
expresses  a  post  condition.  Thus,  vspec  requirements  can  be  used  to  generate  goals  for  synthesis 
processes  to  replace  components  analogously  to  plan  actions  in  ASP-II. 

3.1  Example  Problem 

To  demonstrate  the  case-based  reasoning  technique,  synthesis  of  a  vhdl  component  implementing 

a  search  system  will  be  used  as  an  example.  Figure  4  represents  the  VSPEC  requirements  for  the 

searching  component.  This  requirements  specification  states  that  a  list  of  elements  and  a  key 

are  input  with  the  element  associated  with  the  key  output.  The  requires  clause  states  that  no 

preconditions  exist  on  the  input  set.  (Note  that  the  entity  port  definition  assures  inputs  are  of 

the  correct  type.)  The  ensures  clause  specifies  that  if  an  element  in  the  input  array  has  a  key 

value  associated  with  k,  that  element  is  returned  by  the  function. 

The  VSPEC  entity  is  parsed  and  the  result  is  a  DRIOC  specification  of  the  following  form: 

D  =  seqence(element )  x  integer 

R  =  element 

I(x)  =  true 

0(x,  z)  =  Ve  :  element  •  output  =  et^e6  input  A  k  =  key(e) 

C  =  power  <  10 

3.2  Cases 

Each  stored  case  is  a  triple  consisting  of  a  problem  description,  feature  set,  and  potential  solution. 
The  problem  description  is  the  DRIOC  translation  of  the  vspec  requirements,  the  feature  set  is 
domain  specific  and  derived  from  the  DRIOC ,  and  the  solution  is  a  vhdl  specification  fragment 
annotated  with  vspec.  satisfying  the  problem  description.  The  case-base  is  a  set  of  cases  and 
associated  indexes  used  to  retrieve  cases  efficiently. 


14 


199 


3.3  Retrieval  and  Similarity 

When  presented  with  a  new  problem,  the  case-based  synthesis  process  begins  its  problem  solving 
activity  by  retrieving  one  or  more  similar  cases  from  the  case-base.  Retrieval  is  a  three  step  process 
of:  (a)  generating  a  feature  .set  for  the  new  problem;  (b)  retrieving  functionally  correct  solutions; 
and  (c)  determining  the  most  similar  functionally  correct  solution. 

3.3.1  Features  and  Feature  Types 

A  feature  type  represents  information  common  to  features  representing  the  same  characteristic.  A 
set  of  feature  types  exists  for  each  case-base.  Each  feature  type  has  a  unique  name  and  describes 
how  to  compare  features  of  that  type,  the  relative  importance  of  the  feature,  and  how  to  generate 
the  feature  from  a  problem  description. 

The  following  is  the  feature  type  definition  for  the  input-types  feature.  The  comparison 
function  is  bag-equal,  its  relative  weight  is  1.0,  and  generate-input-types  constructs  features 
of  this  type  from  problems. 

<’ input-types,  ’bag-equal,  1.0,  ’generate-input-types> 

Sets  of  features  describe  problems  and  facilitate  retrieval  and  comparison  of  problems.  A  feature 
is  an  attribute  value  pair  where  the  attribute  names  a  unique  feature  type  and  the  value  is  the 
feature  value.  A  feature  is  legal  if  and  only  if  it  names  a  known  type.  An  example  of  a  legal 
input-types  feature  for  an  entity  accepting  an  integer  and  a  sequence  of  integers  as  inputs  would 
be: 

<’ input-types ,  [‘  ‘integer” ,  ’’seq(element)  ’  ’]> 

Features  and  feature  types  are  defined  based  on  the  VSPEC  descriptions.  A  VSPEC  description 
is  a  collection  of  logical  expressions  and  argument  list  definitions  when  converted.  The  goal  of 
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the  synthesis  problem  is  finding  a  component  whose  behaviof  and  performance  meet  the  vspec 
requirements.  In  case-based  reasoning  terms,  this  translates  to  finding  a  component  whose  vspec 
description  is  similar  to  problem  requirements  and  adapting  that  solution  to  the  specific  problem. 
Because  vspec  is  formal,  the  DRIOC  elements  could  be  used  as  features  and  logical  implication 
used  as  a  matching  function  -  when  corresponding  elements  of  two  DRIOC  descriptions  are  logically 
equivalent,  they  match. 

The  logical  equivalence  approach  to  comparing  vspec  descriptions  is  appealing  because  feature 
generation  is  trivial,  the  features  are  general  to  any  domain  using  vspec  descriptions,  and  the 

comparison  is  formal.  However,  logical  inference  is  computationally  impractical  when  considering 
large  case-bases. 

The  solution  is  defining  features  for  the  specific  domain  of  application,  generating  those  features 
from  vspec  and  using  these  features  for  comparison  purposes.  Generality  and  formal  comparison 
are  lost  with  this  method.  However,  the  efficiency  gain  from  using  simple  comparisons  makes  this 
system  far  more  pragmatic. 

A  collection  of  feature  types  for  the  DSP  domain  is  currently  under  development  for  this  system. 
Following  is  a  short  list  of  some  feature  types  used  in  further  examples: 


input-types 

heat 

fft 

permute (x,y) 


sequence  of  input  types 
heat  dissipation 
Computes  FFT 
x  is  a  permutation  of  y 


output -types  sequence  of  output  types 

power  power  consumption 

ordered  (x)  x  is  ordered 

search  is  a  search  system 


3-3.2  Feature  Generation 


When  a  new  problem  is  presented  to  the  synthesis  system,  a  set  of  features  is  generated.  The 
feature  generation  function  from  each  feature  type  is  applied  to  the  new  problem  and  the  resulting 
features  comprise  the  problem’s  feature  set.  Feature  generation  functions  are  represented  as  Repine 
functions.  The  set  of  generation  functions  are  maintained  in  a  list  and  applied  to  each  new  problem 
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in  a  predetermined  order  to  avoid  the  need  for  conflict  resolution. 

The  following  is  the  function  that  generates  the  input-types  feature.  It  returns  an  attribute 
value  pair  consisting  of  the  input- types  feature  name  and  the  value  stored  in  the  domain  slot  of 
the  problem  description. 


function  generate- input-types  (p  :  problem)  :  feature  = 
< 9  input -types , p . domain> ; 


Following  is  a  subset  of  features  generated  for  the  search  problem. 


{<input-types,  [sequence (element) , integer] >, 
<output -types ,  [element] > , 

<power,  <  <=,10», 

< search,  true>, 

<fft,  false>, 

<ordered,  false>. 


The  first  two  features  represent  D  and  R  and  indicate  what  the  retrieved  case  must  input  and 
output.  The  comparison  function  for  each  is  bag  equality  indicating  the  arity  and  input  and  output 
types  must  match. 

Other  features  are  defined  based  on  I  and  O.  No  features  are  generated  from  I  because  it 
is  always  true.  Effectively,  there  are  no  input  preconditions  and  the  component  should  work  on 
all  inputs  of  the  correct  type.  The  output  postcondition,  O,  does  provide  information  about  the 
desired  results  of  applying  this  component  by  defining  a  search  routine. 

Finally,  features  are  generated  from  C.  The  constrained  by  clause  must  be  a  conjunction 
of  simple  relations.  Each  of  these  relations  forms  a  feature.  The  type  of  each  feature  names  the 
constraint  and  the  relation  and  value  form  a  pair  specifying  the  value.  Simple  interval  arithmetic 
is  used  to  compare  specific  feature  values. 
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3.3.3  Problem  Similarity 

In  a  case-based  reasoning  system,  problem  similarity  indicates  the  level  of  confidence  that  two 
problems  share  a  solution.  This  similarity  measure  is  based  on  two  premises.  First,  the  similarity  is 
proportional  to  the  number  of  common  characteristics  with  matching  values.  Second,  the  premise 
that  similarity  is  proportional  to  the  amount  of  information  involved  in  the  comparison. 

The  similarity  measure  implements  these  premises  as  raw  similarity  and  the  possible  match 
ratio  respectively.  Raw  similarity  is  a  measure  of  how  many  shared  features  match.  Two  features 
match  if  they  are  of  the  same  type  and  their  values  are  equal  based  on  the  feature  type’s  comparison 
function.  The  possible  match  ratio  is  a  measure  of  how  many  feature  types  are  shared  by  the  two 
feature  sets.  A  feature  or  feature  type  is  shared  by  two  features  sets  if  there  is  a  feature  of  that 
type  in  both  sets.  Given  two  sets  of  features,  similarity  is  the  product  of  the  raw  similarity  value 
and  the  possible  match  ratio. 

Raw  Similarity 

Given  two  feature  sets,  raw  similarity  is  the  ratio  of  the  sum  of  weights  from  matching  features  to 
the  sum  of  weights  from  shared  features.  Qualitatively,  raw  similarity  determines  the  weighted  per¬ 
centage  of  features  that  match.  Formally,  raw  similarity  is  defined  as  the  sum  weights  of  matching 
features  divided  by  the  sum  of  the  weights  of  all  shared  feature  types. 

A  DontCare  value  in  a  feature’s  value  slot  represents  a  situation  when  a  feature  is  present, 
but  its  exact  value  does  not  matter.  More  specifically,  when  the  feature  contains  no  useful  in¬ 
formation  for  determining  problem  similarity  or  is  not  known.  The  DontCare  feature  allows  the 
system  to  distinguish  between  situations  where  a  feature  does  not  match  and  when  a  feature  match 
determination  cannot  be  made. 

When  a  comparison  between  features  of  the  same  type  involves  a  DontCare  value,  a  match 
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always  occurs.  However,  to  indicate  the  inexact  nature  of  the  match,  the  weight  used  to  determine 
similarity  is  decreased.  In  this  system,  the  match  is  degraded  by  multiplying  the  weight  from  the 
feature  type  description  by  0.95.  Thus,  the  weight  used  in  the  sum  of  comparison  results  for  raw 
similarity  is  0.95  of  its  original  value.  The  weight  used  for  calculating  the  total  possible  weight  is 
the  original  weight  value. 

An  example  of  how  the  DontCare  values  are  used  arises  when  retrieving  objects  where  con¬ 
straints  are  not  specified.  It  may  be  that  specific  values  for  a  particular  constraint  are  not  known 
because  constituent  components  are  not  yet  described  in  enough  detail.  Given  a  choice  between 
such  a  component  and  a  component  where  the  constraint  is  know  to  be  violated,  the  potential 
solution  should  be  preferred.  By  using  the  DontCare  feature  value  instead  of  a  mismatch  value  or 
leaving  the  feature  out,  the  possible  solution  is  preferred  over  the  solution  known  to  be  incorrect. 
If  a  solution  were  known  to  be  correct,  it  would  be  preferred  over  the  potential  solution. 

The  Possible  Match  Ratio 

The  second  component  of  the  similarity  value,  the  possible  match  ratio,  is  the  ratio  of  the  number  of 
shared  features  to  the  total  number  of  features  defined  for  the  case  being  considered.  The  objective 
of  the  possible  match  ratio  is  to  implement  specificity  in  the  similarity  metric.  Given  that  two  cases 
have  equivalent  raw  similarity  values  with  respect  to  the  current  problem,  the  possible  match  ratio 
will  prefer  the  case  matching  the  highest  percentage  of  features,  thus  involving  more  information 
in  the  comparison. 

In  addition  to  preferring  more  information,  the  possible  match  ratio  allows  loose  definition  of 
solution  categories.  Consider  two  problems,  one  described  by  features  specifying  input  precondi¬ 
tions  and  output  post  conditions,  and  the  other  described  by  features  specifying  representation 
characteristics.  The  first  feature  set  describes  a  problem  best  solved  using  a  data  transform  while 
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the  second  a  problem  best  solved  using  a  data  type.  It  is  conceivable  that  these  two  feature  sets 
could  share  a  small  number  of  feature  types.  If  those  features  match,  the  raw  similarity  metric  has 
no  means  of  determining  that  most  features  cannot  be  compared  and  would  return  a  deceptively 
high  similarity  value.  The  possible  match  ratio  differentiates  between  these  two  solution  categories 
because  few  feature  types  are  shared  by  the  feature  sets. 

Similarity 

The  final  similarity  value  is  the  product  of  the  raw  similarity  value  and  the  possible  match  ratio. 
Table  1  shows  the  results  of  comparing  two  sets  of  features  with  the  problem’s  feature  set.  Note 
that  the  second  set  match  is  lower  due  to  mismatch  of  a  power  consumption  feature. 

3.3.4  Functional  Similarity 

The  simplest  approach  to  retrieval  is  calculating  a  similarity  value  for  each  element  of  the  case-based 
with  respect  to  the  problem  and  choosing  the  most  similar  case.  The  result  is  a  table  much  like 
Table  1  extended  for  the  entire  case-base.  This  is  not  a  practical  approach  for  large  case-bases  due 
to  the  complexity  of  similarity  calculation.  Thus,  solutions  matching  critical  features  are  retrieved 
and  then  ordered  using  the  complete  similarity  metric. 

To  accomplish  this,  the  retrieval  system  maintains  indexes  for  features  representing  functional 
characteristics.  These  features  are  referred  to  as  important  features.  Following  generation  of  of  the 
problem  feature  set,  important  features  are  extracted.  Indexes  statically  maintained  by  the  retrieval 
system  are  used  to  retrieve  a  set  of  cases  whose  important  features  match  problem  features  exactly. 

Static  indexes  are  created  when  a  case  is  added  to  the  case-base.  Feature  values  are  used  as 
retrieval  keys  and  cases  with  features  that  match  a  particular  value  can  be  retrieved  directly  without 
a  similarity  calculation.  Important  features  include  input-types,  output- types  and  some  other 
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features  computed  from  functional  requirements.  In  general,  features  computed  from  constraints  are 
not  important  features,  but  serve  to  choose  a  best  solution  from  all  those  satisfying  the  functional 
requirements. 

All  features  are  used  to  determine  final  similarity  between  the  initially  retrieved  set  and  the 
problem.  Because  all  potential  cases  match  with  respect  to  important  features,  features  representing 
other  constraints  determine  the  functionally  similar  case  representing  is  the  best  solution.  The  case 
returned  by  the  retrieval  process  is  the  case  from  the  initially  retrieved  set  whose  similarity  with 
the  problem  is  maximal. 

The  two  stage  retrieval  process  results  from  two  observations.  First,  the  belief  that  design  is  a 
process  of  finding  a  set  of  functionally  correct  solutions,  then  using  problem  constraints  to  select 
from  them  an  optimal  solution.  Important  features  indicate  the  primary  function  of  the  artifact. 
The  remaining  features  describe  the  constraints  the  solution  exists  under.  Second,  initial  retrieval 
is  achieved  using  statically  maintained  indexes,  without  the  cost  of  calculating  similarity.  Similarity 
is  calculated  over  this  subset  of  the  original  case- base.  This  dramatically  reduces  retrieval  cost  with 
respect  to  a  brute  force  approach  that  calculates  similarity  for  every  element  in  the  case-base. 

Consider  the  feature  sets  generated  for  linear  search  and  batch  sequential  search  shown  in 
Figure  5.  Using  only  important  features,  these  two  cases  are  identical.  They  both  search  arrays  of 
elements  and  return  the  indicated  element  if  found.  Thus,  the  initial  retrieval  would  return  both, 
but  eliminate  cases  for  sorting,  FFTs,  and  cases  where  range  and  domain  are  not  matches. 

Although  the  two  solutions  are  functionally  equivalent,  the  power  feature  representing  a  con¬ 
straint  differs.  In  the  linear  search  entity  the  power  constraint  from  the  original  specification 
is  violated  while  in  the  batch  sequential  entity  the  power  is  not  known.  (See  Table  1  for  exact 
similarity  calculations.)  Thus,  the  similarity  of  the  linear  search  case  is  lessened  and  the  batch 
sequential  option  is  preferred.  The  power  constraint  is  not  verifiable,  but  unlike  the  linear  search 
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option,  it  is  not  known  that  the  constraint  is  violated.  This  is  an  example  of  using  the  DontCare 
feature  value  to  indicate  a  situation  between  a  perfect  match  and  a  mismatch. 

The  batch  sequential  search  architecture  returned  is  the  same  as  the  architecture  shown  in 
Figure  3.  Note  that  a  behavioral  specification  of  the  bin_search  entity  exists,  however  no  spec¬ 
ification  for  the  sort  entity  exists  aside  from  the  vspec  description.  This  represents  a  complete 
specification  of  the  problem  and  can  be  viewed  as  a  solution.  However,  it  is  possible  to  repeat  the 
process  and  attempt  to  synthesize  a  specific  component  for  the  sort  description.  This  is  achieved 
during  adaptation  by  repeating  the  synthesis  process  using  requirements  from  the  sort  description. 

3.4  Adaptation 

The  most  similar  case  found  by  the  retrieval  process  is  modified  to  fit  the  current  problem  by 
the  adaptation  process.  Adaptation  employs  two  independent  methods.  The  first  is  application 
of  adaptation  rules.  The  second  is  replacement  of  case  parts  by  generating  a  sub-problem  and 
recursively  calling  the  case-based  reasoner. 

3.4.1  Rule-Based  Adaptation 

An  adaptation  rule  is  a  Refine  transform.  The  antecedent  is  a  predicate  accepting  three  arguments: 
the  problem  being  solved,  the  problem  solved  by  the  case,  and  the  specification  fragment  being 
adapted.  The  consequent  is  a  Refine  predicate  accepting  the  same  arguments  that  implements 
the  change  to  the  specification  fragment.  A  list  of  adaptation  rules  is  maintained  by  the  system. 
Each  rule  is  evaluated  during  the  first  stage  of  adaptation.  Conflict  resolution  is  achieved  using  a 
static  ordering  based  entirely  on  the  order  rules  exist  in  the  rule-base. 

vhdl  and  vspec  components  are  stored  using  an  object-based  abstract  syntax  tree  common 
to  REFINE  parsing  activities  [1].  This  representation  makes  application  of  adaptation  rules  much 
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easier  because  the  object  model  is  manipulated  rather  than  plain  text.  The  advantage  is  that 
adaptation  rules  need  not  parse  text  to  perform  their  operations.  Retrieving  the  source  code  from 
the  object  model  simply  requires  calling  a  single  print  routine,  thus  there  is  no  loss  of  solution 
generality. 

An  example  of  a  frequently  used  adaptation  rule  does  variable  substitution.  This  function 
gathers  all  identifiers  and  references  from  an  entity  structure  and  applies  a  transformation  to 
them.  In  a  semantically  correct  abstract  syntax  tree,  each  variable  has  a  definition  and  several 
references.  To  change  the  name  of  a  variable,  the  name  must  be  changed  at  the  definition  point 
and  each  reference  point.  This  transformation  is  called  on  each  node  in  the  abstract  syntax  tree.  If 
a  node  is  an  identifier  reference,  it  checks  the  identifier  name  and  changes  those  matching  the  old 
variable  name  to  the  new  variable  name.  Similarly,  it  finds  the  identifier  definition  and  changes  its 
name  to  the  new  name.  Without  the  abstract  syntax  tree,  the  source  VSPEC  would  require  lexical 
and  syntactic  analysis  to  perform  this  operation. 


function  subst- variable  ( the- ent it y: entity, old-name: symbol, new-name: symbol)  = 
let  (idents  =  entity-port (the- ent ity) , 

idents-refs  =  descendant s-of -class (the-ent ity,  ?ident-ref )) 
ref  in  idents-refs  & 
ident-name  (ref )  =  old-name  & 

— > 

ident-name  (ref)  =  new-name j 
ref  in  idents  & 
ident-name  (ref)  =  old-name 
— > 

ident-name  (ref)  =  new-name 


3.4.2  Sub-Problem  Based  Adaptation 

The  second  adaptation  method  is  case  element  replacement.  This  involves  defining  a  function 
or  goal  associated  with  the  fragment  and  using  the  internal  environment  defined  by  the  case  to 
constrain  possible  solutions  [6]. 
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Case  fragment  replacement  is  used  to  replace  a  portion  of  a  structural  specification  architecture. 
Because  a  structural  architecture  is  a  collection  of  components,  identification  of  a  case  fragment 
for  replacement  is  identifying  an  appropriate  component.  To  define  a  function  for  the  case  fragment, 
the  reasoning  process  uses  the  difference  between  the  system  state  before  and  after  the  execution 
of  the  component  action.  The  reasoning  process  assumes  that  the  component  caused  the  difference 
intentionally.  Thus,  the  difference  defines  what  must  be  the  goal  of  the  component.  Constraints 
defined  by  preconditions  and  the  external  environment  together  defined  constraints  on  the  new 
synthesis  problem.  The  difference  between  the  system  state  before  and  after  component  execution 
is  obtained  from  either  the  vspec  representation  of  the  component,  or  from  vspec  defining  outputs 
of  systems  providing  input  to  the  component  and  the  preconditions  of  components  receiving  output. 
The  result  is  a  new  problem  whose  solution  can  replace  the  original.3 

Replacing  components  also  may  occur  when  instantiating  a  general  architecture.  Recall  that 
entity  structures  referenced  by  an  architecture  may  be  defined  only  using  VSPEC  and  need  not 
have  a  vhdl  implementation.  Thus,  the  requirements  of  the  component  are  expressed  without  the 
specifics  of  the  implementation.  The  VSPEC  is  transformed  into  a  problem  description  and  a  VHDL 
component  satisfying  the  requirements  results. 

As  an  example  of  case  component  replacement,  consider  synthesis  of  an  architecture  for  the 
sort  entity.  The  DRIOC  form  of  the  vspec  is  as  follows: 


D  = 

seqence(element )  x  integer 

R  = 

element 

I{x)  = 

true 

II 

"n' 

O 

Ve  :  element  •  output  =  e^eE  input  A  k  =  key(e) 

c  = 

power  <  10 

sFor  a  detailed  discussion  of  this  adaptation  scheme,  please  see  [4] 
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The  feature  set  associated  with  this  problem  is  similar  to  the  feature  set  for  the  original  search 
problem,  but  no  constraints  are  specified  and  feature  values  are  changed  appropriately. 

{< input -types,  [sequence (element)] > , 

<output-types ,  [sequence (element) ] > , 

<power,  DcmtCare>, 

<search,  false>, 

<fft,  false>, 

Cordered,  true>, 

<permutation,  true>, 

...} 

The  retrieval  activity  here  is  identical  to  retrieval  of  the  batch  sequential  architecture  and  the 
discussion  will  not  be  repeated.  Any  appropriate  sorting  architecture  may  be  retrieved  given  the 
current  set  of  features.  For  this  example,  assume  a  quicksort  entity  is  retrieved  described  by  the 
VSPEC  entity  shown  in  Figure  6. 

The  resulting  VHDL  description  is  the  batch  sequential  architecture  combined  with  the  quicksort 
architectureThis  represents  a  new  solution  at  a  lower  level  of  abstraction.  Before  it  is  accepted 
as  a  solution,  the  new  system  must  be  evaluated  with  respect  to  constraints. 

3.5  Evaluation 

Following  synthesis  of  the  potential  solution,  a  case-based  reasoner  attempts  to  evaluate  a  solution 
to  determine  its  fitness.  The  evaluation  process  in  this  case-based  reasoning  system  exclusively 
involves  determining  if  the  proposed  solution  meets  specified  constraints. 

Recall  that  the  constrained  by  clause  defines  constraints  the  system  must  satisfy.  These 
constraints  are  translated  into  features  for  the  retrieval  process.  If  the  solution  is  monolithic, 
constraint  satisfaction  is  a  simple  comparison  of  C(ci . . .  cn)  from  the  problem  description  and 
the  proposed  solution.  However,  when  the  solution  is  a  collection  of  components,  more  complex 
constraint  verification  must  be  applied. 
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Constraint  verification  is  achieved  by  specifying  constraint  behavior  and  transforming  that  be¬ 
havioral  description  into  Repine  theories.  Given  the  constraints  from  the  high  level  specification 
and  a  set  of  constraints  from  component  constraints,  the  Refine  theories  determine  if  the  compo¬ 
sition  of  component  constraints  continue  to  meet  the  higher  level  constraints.  Theories  currently 
exist  in  this  system  to  evaluate  heat  dissipation,  power  consumption,  clock  speed,  pin-to-pin  timing, 
and  area.  By  checking  performance  constraints  in  the  earliest  stages  of  synthesis,  such  issues  are 
managed  concurrently  with  the  synthesis  activities. 

With  the  batch  sequential  search  system  completed,  constraints  on  the  subcomponents  of  the 
batch  sequential  search  algorithm  are  now  known.  The  theory  of  power  consumption  this  system 
uses  states  that  the  total  power  used  by  a  device  is  equal  to  the  sum  of  the  power  used  by  the  device’s 
components.  Instantiated  for  this  problem,  the  total  power  used  by  bin_search  and  quicksort 
must  be  less  than  10  watts.  The  constraints  on  the  components  say  that  they  consume  no  more 
than  4  watts  and  no  more  than  5  watts  respectively.  Using  interval  arithmetic,  the  sum  is  no  more 
than  9  watts  and  the  10  watt  constraint  is  met.  If  a  constraint  violation  is  discovered,  the  offending 
potential  solution  is  discarded.  The  reasoning  process  is  repeated  in  an  attempt  to  find  a  better 
solution.  Alternative  approaches  include  simply  reporting  the  constraint  violation  or  involving  the 
user  in  the  decision  process. 

Evaluation  of  constraints  occurs  both  when  retrieving  and  evaluating  potential  solutions.  At 
each  stage  of  the  synthesis  activity,  non-functional  requirements  are  evaluated  concurrently  with 
functional  requirements.  Thus  implementing  a  simple  concurrent  engineering  synthesis  process. 
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4  Limitations 

Early  experimentation  indicates  this  synthesis  approach  is  effective  using  small  case-bases  in  rea¬ 
sonably  restricted  domains.  Currently,  this  approach  is  being  extended  to  solve  co-design  problems 
and  the  initial  case-based  is  being  extended.  Several  limitations,  although  not  fatal,  have  been 
identified. 

4.1  Case-Base  Construction 

The  greatest  potential  limitation  to  this  approach  is  case-base  construction.  The  system  cannot  use 
a  component  that  is  not  defined  in  its  case-base,  implying  that  a  large  case-base  must  be  developed, 

or  individual  case-bases  must  be  developed  for  each  domain.  This  requires  identification  of  a  core 

/ 

set  of  components  with  vspec  annotations. 

VHDL  libraries  currently  exist  and  are  being  aggressively  constructed  in  the  DSP  domain.  How¬ 
ever,  these  libraries  are  not  annotated  with  VSPEC,  thus  forcing  back  annotation  by  hand  or  using 
an  automated  approach.  Hand  annotation  is  time  consuming  and  difficult.  Automated  annotation 
is  not  practical  at  this  time. 

An  alternative  approach  is  implementation  of  other  synthesis  techniques  that  generate  inno¬ 
vative  solutions  and  use  these  approaches  to  augment  the  case-base.  Currently  this  approach  is 
being  pursued  through  integration  with  the  kids  software  synthesis  environment  and  transforma¬ 
tion  based  synthesis  techniques.  New  solutions  are  generated  when  necessary  and  old  solutions 
are  re-used  when  possible.  It  should  be  noted  that  although  they  do  extend  the  case-base,  other 
techniques  are  also  limited  by  their  synthesis  knowledge. 
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4.2  Features  and  Feature  Generation 

Each  synthesis  domain  requires  definition  of  feature  types  and  feature  generation  functions.  Once 
cases  are  identified,  they  must  be  indexed  and  stored  in  the  case- base.  As  with  case-based  con¬ 
struction,  a  universal  set  of  features  can  be  defined,  or  individual  feature  sets  can  be  developed  for 
each  domain.  The  second  solution  is  the  obvious  choice  given  the  trade-off  between  computational 
complexity  and  brittleness  caused  by  domain  specific  features.  Without  exploiting  some  domain 
specific  information,  retrieval  become  computationally  prohibitive. 

4.3  Solution  Correctness 

Currently  there  is  no  guarantee  that  any  given  solution  is  correct.  If  a  VHDL  solution  is  synthe- 
■  sized,  simulation  is  available  for  some  limited  correctness  evaluation.  An  approach  currently  being 
developed  is  to  use  a  theorem  proving  approach  on  the  vspec  description.  The  limitations  of  such 
an  approach  as  a  retrieval  technique  were  presented  earlier.  However,  once  a  solution  is  found, 
the  problem  is  reduced  to  checking  one  solution.  This  still  requires  pragmatic,  efficient  theorem 
proving  techniques  to  ultimately  be  practical. 

5  Related  Work 

As  VSPEC  is  a  Larch  interface  language  for  VHDL  it  borrows  from  the  construction  of  other  interface 
languages.  Specifically,  vspec  is  styled  after  the  LM3  Larch  interface  language  for  Modula-3  [10]. 
Odyssey  Research  Associates  is  currently  developing  an  alternative  Larch  interface  language  for 
vhdl  [9].  This  language  does  not  support  representation  of  constraints  other  than  time  and  is 
targeted  for  formal  analysis  rather  than  synthesis.  They  are  attempting  to  generate  a  formal 
semantics  for  VHDL  using  LSL  for  proving  correctness.  ORA’s  interface  language  also  differs  in  its 
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implementation  of  time.  An  absolute  time  based  temporal  logic  is  used  in  specifying  the  function 
of  an  entity.  Thus  one  can  specify  that  a  predicate  becomes  true  at  a  specific  time  using  the 
notation  “P(x)@t'\  The  VSPEC  notation  specifies  time  intervals  as  constraints  independent  of 
system  function. 

Another  attempt  to  annotate  vhdl  is  VAL  [5].  val  annotates  all  aspects  of  the  vhdl  design.  All 
signals  in  the  namespace  of  the  vhdl  representation  are  in  the  namespace  of  the  val  annotation. 
Thus,  VAL  annotates  specific  vhdl  designs  rather  than  represent  requirements.  ORA’s  interface 
language  is  similar  in  this  respect,  but  does  support  separate  requirements  definitions. 

Although  vhdl  is  a  hardware  synthesis  language,  synthesis  of  vhdl  designs  is  a  software  syn¬ 
thesis  activity.  Viewing  software  synthesis  as  a  planning  activity  was  proposed  in  the  KBSA  ef¬ 
fort  [16,  22,  17]  and  used  heavily  in  the  BENTON  [2]  system.  Both  systems  use  plans  to  represent 
and  control  software  synthesis  activities.  In  this  system,  plans  are  not  explicitly  used  and  represent 
only  the  structure  of  solutions.  Other  attempts  at  case-based  software  design  include  CEASAR  [7], 
analogical  reuse  [11,  12],  and  work  in  derivational  analogy  [13].  Some  also  view  reuse  work  by 
Prieto-Diaz  [15]  as  case-based  reasoning,  however  it  is  not  a  heuristic  approach  and  involves  no 
adaptation  of  final  solutions. 

6  Future  Work 

Current  VSPEC  research  involves  pursuing  domain  specific  support  for  specification  activities  and 
support  for  formal  synthesis.  An  important  aspect  of  any  Larch  language  is  its  associated  handbook. 
A  handbook  is  simply  a  collection  of  reusable  theories  defined  in  the  shared  language.  Handbook 
theories  represent  commonly  used  structures,  algorithms  and  characteristics  as  well  as  domain  spe¬ 
cific  information.  For  VHDL  theories  to  represent  standard  vhdl  types,  low  level  logic  functions  and 
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conversion  routines  are  being  implemented.  In  addition,  libraries  to  support  specifications  involving 
signal  attributes  such  as  event,  stable,  and  delay  are  under  development.  Theories  for  pin-to- 
pin  timing,  heat  dissipation,  power  consumption,  area  and  clock  speed  have  been  implemented  to 
support  constraint  checking  during  the  design  process. 

The  isomorphic  relationship  between  vspec  and  algebraic  specifications  is  being  used  to  exploit 
work  in  formal  synthesis,  specifically,  developing  morphisms  between  algorithms  [21].  This  involves 
development  and  implementation  of  theories  useful  in  constructing  multicomponent  systems  such 
as  the  batch  sequential  search  algorithm  appearing  earlier  in  this  paper. 

Finally,  formal  evaluation  of  specifications  and  solutions  is  being  explored.  Although  it  may 
be  impractical  to  use  formal  inference  in  the  retrieval  process,  once  a  solution  is  found  it  is  a 
practical  verification  tool.  Given  vspec  descriptions  of  both  the  problem  and  solution,  various 
levels  of  satisfaction  may  be  evaluated.  Logical  equivalence  is  the  ideal  comparison,  however,  logical 
implication  may  be  acceptable.  Even  more  interesting  is  the  use  of  modal  logics  and  antecedent 
discovery  to  correct  incomplete  specifications  or  restrict  solutions. 
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entity  search  is 

port  (input:  in  array  of  element; 
k :  in  keytype ; 
output:  out  element); 
end  search; 


element 

array 

key 


search 

??? 

:  "" 

element 


a) 


b) 


Figure  1:  A  vhdl  entity  describing  a  record  search. 
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entity  search  is 

port  (input:  in  array  of  element; 
k:  in  keytype; 
output:  out  element); 
modifies  output; 
requires  true; 
ensures 

output  =  e  <=>  key(e)=k  and 
e  in  input 

constrained  by 
power  =<  5  and 
area  =<  .3 
end  search; 

a) 


Figure  2:  A  vspec  entity  describing  a  record  search. 
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Feature  Name 

Problem 

Linear  Search 

Batch  Sequential 

goal 

input -types 
output -types 

entity 

[integer , seq (element)] 
[element] 

ent ity 

[integer ,seq (element)] 
[element] 

entity 

[integer, seq (element)] 
[element] 

fft 

false 

false 

false 

ordered (z) 

DontCare 

DontCare 

DontCare 

permute (x,z) 

DontCare 

DontCare 

DontCare 

search 

true 

true 

true 

heat 

DontCare 

DontCare 

DontCare 

power 

«=,10> 

DontCare 

«=,11> 

area 

DontCare 

DontCare 

DontCare 

Possible  Match 

1.0 

1.0 

1.0 

Raw  Similarity 

1.0 

0.978 

0.975 

Similarity 

1.0 

0.978 

0.975 

Table  1:  Table  showing  a  subset  of  features  generated  for  the  search  problem  and  2  potential 
solutions.  The  bottom  rows  indicate  calculated  similarity  values.  Assume  all  weights  are  1. 
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entity  search  is 

port  (input:  in  array  of  element; 
k :  in  keytype ; 
output:  out  element); 
modifies  output; 
requires  true; 
ensures 

output  =  e  <=>  key(e)=k  and 
e  in  input 

constrained  by 
power  =<  5  and 
area  =<  .3 
end  search; 

architecture  structure  of  search  is 
component  sorter 

port  (input:  in  array  of  element; 

output:  out  array  of  element); 
component  bin. search 

port  (input:  in  array  of  element; 
key:  in  keytype; 
value:  out  element); 
signal  sorted. array:  array  of  element; 
begin 

sort.instant :  sorter 
port  map  (input=>in.array; 

output=>sorted.  array) ; 
search.instant :  bin.search 

port  map  ( input => sorted. array ; 
k=>in_key; 
value=>out.value) ; 

end  bat-seq; 
entity  sort  is 

port  (input:  in  array  of  element; 

output:  out  array  of  element); 
modifies  output; 

ensures  bag(input)  =  bag(output)  and 
sorted (output) ; 
constrained  by 
power  <=  2  and 
area  <=  .  1 ; 
end  sort; 


entity  bin.search  is 

port  (input:  buffer  array  of  element; 
k:  in  keytype; 
value:  out  element); 
modifies  out; 
requires  sorted ( input ) ; 
ensures 

(fa  e: element) 

output  =  e  <=>  key(e)=k  and 
e  in  input; 

constrained  by 
power  <=  3  and 
area  <=  .2; 
end  bin. search; 

architecture  behavior  of  bin. search  is 
begin 

pi:  process 

—  VHDL  representation  of  a 

—  binary  search  over  ordered 

—  arrays 
end  process; 

end  behavior; 


Figure  3:  VSPEC  representation  of  a  search  architecture  using  a  batch  sequential  approach.  The 
original  list  is  sorted  and  a  binary  search  finds  the  desired  object  from  the  resulting  list. 
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entity  search  is 

port  (list:  in  array  of  element; 
k:  in  integer; 
output:  out  element); 
modifies  output; 
requires  true; 
ensures 

(fa  e: element) 

(output  =  e)  <=> 

(e  in  input  and 
k  =  key(e)) ; 
constrained  by 
power  <=  10; 
end  example; 


Figure  4:  vspec  requirements  for  a  searching  component. 


{<input-types ,  [sequence (element) , integer]  > , 
<output-types ,  [element] > , 

<power,  DontCare>, 

<search,  true>, 

<fft,  false>, 

<ordered,  false>, 

•  ••} 

a)  Features  from  batch  sequential  search 


{< input -types,  [sequence (element) , integer] >, 
<output-types ,  [element] > , 

<power,  <=,11», 

<search,  true>, 

<fft,  false>, 

<ordered,  false>, 

b)  Features  from  linear  search 


Figure  5:  Features  from  linear  and  batch  sequential  search  returned  by  the  retrieval  algorithm. 


entity  quicksort  is 

port  (input  :  in  array  of  element; 

output  :  out  array  of  element) 
modifies  output; 
requires  true; 
ensures 

bag ( input )=bag( output)  and 
sorted (output) ; 
constrained  by 
power  <=  5; 
end  quicksort; 


architecture  behavior  of  quicksort  is 
begin 

pi:  process 

—  VHDL  description  of  a 
—  quicksort  algorithm 
end  process; 
end  behavior; 


Figure  6:  quicksort  entity  with  VSPEC  annotations  and  behavioral  VHDL  description. 
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Abstract 

Systems  engineering  is  the  process  of  looking  at  many 
facets  of  an  emerging  design.  A  systems  engineer  is  re¬ 
quired  to  examine  and  reconcile  many  information  sources 
when  making  high  level  design  decisions.  Although,  yypji 
is  an  excellent  digital  system  description  language,  it  lacks 
flexibility  to  address  all  systems  level  issues.  Digital  system 
behavior  and  structure  are  effectively  handled,  but -other 
facets  are  not.  vspec  represents  one  attempt  to  model  other 
facets  in  the  vhdl  framework.  It  adds  functional  require- 
ment  and  performance  constraint  modeling  to  the  vhdl 
based  design  process.  This  paper  first  describes  vspec  and 
its  interaction  with  vhdl.  It  argues  that  vspec  is  an  excel¬ 
lent first  step  towards  a  systems  level  description  language. 
However,  other  facets  are  needed  to  model  complete  sys¬ 
tems.  A  language  structure  for:.  representing  these  facets  is 
proposed  and  a  potential  som&fbr  a  semantic  definition  is 
identified.  Jf 


1  Introduction 


Systems  level  design  is  characterized  t>y  the  heed 


to 


deal 


ity  arises  from  two  soii^;  ty  moaeung  components  using 
different  computational  models;  and  (n)  modeling  differ¬ 
ent  component- Inti  system  facets.  Different  system  com¬ 
ponents  are  best  modeled  using  different  basic  semantic 
models.  Digital  electronic,  analog  electronic,  optical,  and 
MEMS  components  all  have  different  underlying  mathe¬ 
matical  domain  models.  Like  heterogeneous  components. 


•Support  for  this  work;  was  provided  in  part  by  the  Advanced  Research 
Projects  Agency  and  monitored  by  Wright  Labs  under  the  RASSP  Tech¬ 
nology  Program,  contract  number  F33615-93-C-1316 


multiple  facets  of  the  same  component  require  different  un¬ 
derlying  semantic  models.  Electromagnetic,  analog,  digital 
and  constraint  facets  agaiin  have  different  underlying  math¬ 
ematical  domain  models.  Further,  different  models  may  be 
used  for  the  same  facet  under  different  circumstances. 

uflb  address  the  systems  level  design  process,  vhdl  must 
be  extended  to  include:  (i)  multiple  modeling  paradigms 
for  different  component  facets;  (ii)  multiple  modeling 
paradigms  for  different  component  domains;  and  (iii)  a 
means  for  moving  information  between  system  represen- 
:  . I  Stations.  Multiple  modeling  paradigms  supports  integrated 
1|  modeling.  Moving  information  between  system  repre- 
sentations  supports  using  multiple  semantic  models  with¬ 
out  forcing  a  single  model.  Interestingly,  vhdl  provides 
syntactic  support  for  multiple  modeling  paradigms.  The 
entity/architecture  model  supports  defining  both 
structural  and  behavioral  models  for  die  same  component. 
This  basic  architecture  has  been  used  to  extend  vhdl  to  the 
analog  domain  and  to  define  constraint  and  requirements 
m  models. 

Moving  information  between  semantic  models  presents 
a  more  difficult  problem.  Effectively,  the  vhdl  semantics 
must  be  extended.  Goguen’s  model  of  institutions  [6]  can 
be  used  as  a  basis  for  such  modeling.  Using  institutions,  se¬ 
mantic  domains  are  defined  as  categories  and  functors  used 
to  define  when  information  from  one  domain  is  valid  in  an¬ 
other. 

This  paper  presents  existing  efforts  to  move  vhdl  to  the 
systems  level.  Erst,  a  brief  overview  of  vspec  is  presented. 
vspec  is  an  interface  specification  language  for  vhdl  that 
represents  an  initial  attempt  to  model  multiple  component 
facets  at  the  requirements  level.  Second,  the  model  asso¬ 
ciating  an  entity  with  one  or  more  architecture  is 
extended  to  provide  a  multi-faceted  model.  As  an  example, 
the  vspec  requirements  and  constraint  models  are  repre- 
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seated.  The  paper  concludes  by  describing  open  semantic 
issues  and  problems  that  must  be  addressed. 

2  VSPEC- A  First  Step 

vhdl  provides  users  with  a  means  for  modeling  both  the 
behavior  and  structure  of  a  digital  system.  It  provides  users 
with  an  operational  language  for  describing  the  behavior  of 
a  component.  This  language  subset,  referred  to  as  behav¬ 
ioral  vhdl,  allows  users  to  describe  data  transforms  and 
control  structures  for  components  using  a  programming  lan¬ 
guage  style  syntax,  vhdl  also  provides  users  with  a  declar¬ 
ative  language  for  describing  the  structure  of  a  system.  This 
language,  referred  to  as  structural  vhdl,  allows  users  to  de¬ 
scribe  interconnections  between  components  using  a  simple 
net  list-based  module  interconnect  language. 

Using  behavioral  and  structural  architectures  of 
the  same  component  allows  vhdl  users  to  model  two  facets 
of  components  and  systems:  function  and  structure.  Thus, 
a  user  might  provide  a  high  level,  black-box  behavioral  de¬ 
scription  and  use  that  description  as  a  basis  for  refinement 
into  a  structural  system  decomposition.  Such  activities  are 
common  in  top-down  design  processes  making  these  facets 
and  their  interaction  quite  useful  to  systems  designers. 

Behavioral  and  structure  vhdl  share  a  common 
simulation-based  semantics  that  allows  information  from 
one  facet  to  be  visible  in  the  other.  More  specifically,  the 
results  of  simulating  structural  and  behavioral  representa¬ 
tions  of  a  component  can  be  directly  compared.  Thus,  de¬ 
signers  are  able  to  evaluate  the  results  of  design  iterations 
by  simulating  and  comparing  results. 

Although  vhdl  has  excellent  operational  specification 
capabilities,  their  application  during  the  design  process  is 
limited.  One  limitation  noted  in  oiir  research  activities  is  at 
the  abstract  requirements  specification  level  ^Specifically: 
(i)  vhdl’s  operational  semantics  are  not  suitable  for  ab¬ 
stract  functional  requirements;  and  (ii)  vhdl  provides  no 
means  for  describing  performance  requirements.  These  two 


;  use¬ 
ful  in  the  design  process,  vspec  is  ah  initial  attempt  to 
address  these  facets  in  the  context  of  vhdl. 


2.1  An  Example 


vspec  uses  a  modified  axiomatic  specification  technique 
for  modeling; ^component’s  function  and  performance  re¬ 
quirements,  IA  pre-  and  post-condition  are  defined  to  in¬ 
dicate:  (i)  what  must  be  true  in  the  current  state;  and  (ii) 
what  must  be  true  in  the  next  state.  This  pre-  and  post¬ 
condition:  follow  the  traditional  axiomatic  semantics  pre- 
sentedlby  and  are  implemented  using  a  Larch 

Interface  Language  approach  [8] .  This  axiomatic  specifi¬ 
cation  is  augmented  with  an  activation  condition  indicating 


what  circumstances  cause  the  component  to  activate.  The 
activation  condition  is  needed  because  of  the  concurrent  na¬ 
ture  of  vhdl  components  in  contrast  to  the  serial  nature  of 
software  components. 

The  axiomatic  specification  approach  is  further  modified 
to  describe  performance  requirements.  Such  performance 
requirements  are  modeled  using  a  simple  declarative  se¬ 
mantics  to  express  relations  over  constraint  variables.  They 


tional  requirements. 

To  understand  how  vspec  defines  requirements  and  con¬ 
straints,  an  example  of  a  simple  search  component  is  pre¬ 
sented  in  Figure  1.  This  component  accepts  an  array  of  el¬ 
ements  and  a  key  and  returns  the  array  dement  associated 
with  that  key.  Changing  the  value  of  either  the  key  input  or 
the  array  to  be  searched  causes  the  component  to  activate. 

package  search_t ype s  is 
type  E  is  mutable ; : 
type  K  is -mutable; 

type  E__array  :;;is  array  (integer  range  O)  of  E; 
end  se archetypes;’ 

use  work . se archetypes; 
entity  search :is  port 
: (input : ,  iii  E_array ; 
k:  in  K; 
output:  out  E); 
includes  KeyToElement  (E,  K)  ; 
includes  Area,  Power,  Frequency; 
modifies  output; 
sensitive  to 

k' event  or  input 'event; 
require s  t rue ; 
ensures  forall  e:  E 

(output' post  =  e)  iff 
(key  (e)=k 
and  e  E  input)); 

:  constrained  by 

area  <  (3  urn  *  5  um) 
and  power  <10  mW 
and  clock_frequency  <  50  MHz; 
end  search; 


KeyToElement (E, K)  : 
introduces 
key:  E  — >■  K 


trait 


Figure  1 .  An  example  component  defining  re¬ 
quirements  for  a  simple  search  component. 


2.2  Functional  Requirements 

The  basic  specification  model  used  for  vspec  functional 
requirements  is  a  state  machine.  Figure  3  represents  in- 
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Figure  2.  A  graphical  representation  of 
VSPEC  information  representation. 


formation  defined  by  a  vspec  component.  Using  Ate  ax¬ 
iomatic  style,  a  state  machine  is  specified.  Pre-conditions 
and  post-conditions  define  the  output  and  next  state  func¬ 
tions  while  entity  ports  and  vspec  state  variables  define 
component  state. 


input 

ports 


entity  E 


F(x,s)  U  2 


H  s 


output 

ports 


Figure  3.  State-based  model  of  functional 
specification. 


Functional  requirements  are  modeled:  udng  the  activa¬ 
tion  condition,  pre-condition  and  post-condition.  These 
are  specified  in  the  sensitive  to,  requires;,  and 
ensures  dauses,  respectively!.  The  requires  clause  de¬ 
fines  a  pre-condition  that  must  be  true  in  the  current  state  for 
the  component  to  execute  correctly.  Hie  ensures  clause 
defines  a  post-condition  the  component  must  make  true  in 
the  next  stall  given  the  pre-condition  is  true  in  the  current 
state.  Giy|h  that  x  is  the  collection  of  entity  ports  and 
vspec  stall  variables  providing  input  or  state  and  z  is  the 
collection  of  entity  ports  and  vspec  state  variables  pro¬ 
viding  output  or  next  state,  the  relationship  defined  by  the 
requires  and  ensure!  clauses  can  be  defined  as: 


Vx  -  requires(x)  =>■  3 z  ■  ensures  (x,z)  (1) 

The  axiorhatic  specifications  define  the  data  transforma¬ 
tion  performed  by  the  component.  These  spedfications  de¬ 
fine  rectuirements  on  how  the  component  transforms  data 
by  defining  relations  between  inputs,  current  state  and  out¬ 
puts.  Specifically,  any  implementation  of  these  require¬ 
ments,  F(x) ,  must  provide  a  witness  forz  that  satisifies  the 
requirements.  Skolemizing  Equation  1:  results  in  the  follow¬ 
ing  correctness  condition  for  the  data  transformation: 


V£- require  s(a?)  =>•  ensures(£,  F(£))  (2) 

Although  simple,  the  importance  of  this  relationship  is 
the  connection  it  provides  between  the  requirements  defined 
by  vspec  and  the  execution  of  a  vhdl  implementation. 
Given  only  these  requirements,  any  vhdl  implementation 
of  F(x)  is  a  correctlmplementation.  Thus,  the  requirements 
facet  is  connected  semantically  to  the  behavioral  or  struc¬ 
tural  facet.  Further,  the  requirements  facet  could  be  assod- 
ated  with  a  test  facet  or  other  facet  defined  for  a  component. 
lyThe  ensures  and  requires  clauses  of  the  example 
search  component  (Figure  1)  define  the  following  ax¬ 
iomatic  requirements: 


V input :  Earray,  k:  K  ■  true  =►  Ve  :  E,  Soutput :  E  •  (3) 
output  =  e  ^  key(e )  =  k  A  e  6  input 

Simplifying  the  implication  via  implication  elmination 
yields:  • 


:-|  Vinput :  Earray,  k:  K,e  :  E,  -3 output :  E- 
output  =  e  key(e)  =  k  A  e  €  input 


(4) 


The  requirement  defined  in  Equation  4  states  that  an  out¬ 
put  of  this  component  is  correct  if  and  only  if:  (i)  the  output 
is  in  the  input  set:  and  (ii)  the  key  assodated  with  the  out¬ 
put  is  equal  to  the  input  key.  Any  component  meeting  this 
requirement  is  potentially  a  solution  to  the  defined  problem. 
Note  that  using  the  declarative  representation,  requirements 
are  stated  directly  rather  than  identifying  a  specific  candi¬ 
date  solution. 

Theactivationconditiondefinedinthesensitive  to 
clause  defines  when  a  component  becomes  active.  Like  the 
pre-condition,  the  activation  condition  must  be  true  for  the 
component  to  function.  If  the  pre-condition  is  false,  the 
component’s  behavior  is  undefined.  In  contrast,  if  die  activi- 
afion  condition  is  false  the  component  maintains  its  current 
state.  The  activiation  condition  modds  events  that  cause  die 
component  to  perform  its  task. 
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When  components  are  interconnected,  activation  condi¬ 
tions  model  interaction  between  components.  Activation 
conditions  are  defined  over  the  same  symbol  set  as  pre¬ 
conditions.  They  monitor  inputs  and  state  to  determine 
when  the  component  should  perform  its  data  transform. 
When  inputs  are  connected  to  outputs  from  other  compo¬ 
nents  or  inputs  from  outside  the  system,  control  is  com¬ 
municated  between  components.  Activation  conditions  are 
modeled  using  a  process  algebra.  Process  algebras  are  de¬ 
signed  specifically  to  model  reactive  systems  and  suit  the 
semantic  needs  of  activation  conditions  nicely.  Specifically, 
vspec  activatcion  conditions  are  modeled  using  CSP  [10]. 

Each  vspec  entity  is  modeled  as  a  CSP  process.  The 
alphabet  of  each  process’  is  the  set  of  system  states  where 
its  associated  activation  condition  is  true.  By  definition,  the 
CSP  process  ignores  any  symbol  not  in  its  alphabet.  Thus, 
any  state  where  the  component  is  not  active  is  ignored  by 
the  component. 

Given  an  activation  condition  A{x),  the  set  of  states 
where  A  is  active  is  defined  as  a  =  {s :  S  |  A(s)}.  Using 
'I'a  the  process  P  associated  with  a  vspec  component  is 
defined  loosely  as:1 


Ps  =  e:VA-+  Ps, 


(5) 


where  s  and  s'  are  the  current  and  next  states  and  satisfy 
the  axiomatic  requirements.  Briefly,  the  notation  indicates 
that  a  process,  P  in  state  s  waits  for  an  event  e  front 
Because  ’3' a  only  contains  states  where  the  activation  con¬ 
dition  is  true,  P  will  effectively  ignore  all  other  states.  For 
any  e  in  $A,  a  process  P  in  state  s'  results.  If  it  is  known 
that  some  function  F  satisfies  the  axiomatic  requirements, 
then  the  previous  equation  can  be  rewritten  as: 

P>  —  e  :  a  Pp(a)  (6) 

Note  that  even  within  vspec’s  functional  modeling  com¬ 
ponent,  two  facets  exist  Specifically,  the  axiomatic  model 


In  the  semantics  of  vs?ic,  these  two  facets  communicate 
by  sharing  a  common  definition  in  the  Larch  Shared  Lan¬ 
guage  [8]. 

The  sesitive  to  clause  from  the  Search  compo¬ 
nent  (Figure  1)  defines  the  following  activation  condition: 


jjfc’ event  V  input' event 


f(7) 

[predi¬ 


cate: 


Wk  :  J||i npiit :  Earray  •  event(k)  V  event{input)  (8) 


1  Component  semantics  are  substantially  more  complex  than  this  simple 
example.  The  complexity:;  adds  nothing  to  this  paper  ;  Interested  readers 
should  reference  specific  VSPHG  p^rers  Ksted  in  tfiehibliogr^)hy  [4, 3] 


The  attribute  event  is  actually  a  predicate  that  is  true 
when  the  value  of  its  assocated  symbol  has  changed  from 
the  last  state.  Thus,  the  activation  condition  is  true  when 
either  the  key  or  search  database  changes  values. 

2.3  Architectures 


An  architecture  is  a  collection  of  interacting  compo¬ 
nents.  VHDL  provides  structural  descriptions  for  defining 
component  and  process  interconnection,  vspec  uses  the 
same  structural  descriptions  to  connect  entities  annotated 
with  vspec  definitions.  A  vspec  structural  description  is 
exactly  analogous  to  structural  architectures  used  in  tradi¬ 
tional  vhdl.  Figure  4  shows  a  vspec  component  archi¬ 
tecture  for  a  search  architecture.  This  architecture  speci¬ 
fies  a  sort  component  that  prepares  input  for  a  binary  search 
component.  Figure  5  defines  the  vspec  requirements  for 
the  components  used  in  the  architecture. 

use  work . sear ch_types; 
architecture  structure  of  search  is 
component  sort? 
port  (listen: 

list_out : 
end  compoherit; 

^component  |pi  n_s  e  ar  ch 
•|#  port  (list_in:  in  E_array; 
k :  in  K; 
e:  out  E) ; 
end  component; 
signal  x:  E_array; 
begin  ;; 

.Cl:  :  sort  port  map  (input, x); 

C2  :  :bin_search  port  map  (x,  k,  output ) ; 
end  ^Structure; 


in  E_array; 
out  E^array); 


Figure  4.  A  candidate  architecture  for  the 
search  example. 


vspec’s  process  algebra  semantics  supports  defining 
bisimulation  relationships  [14]  between  single  component 
requirements  and  vspec  component  architectures.  A 
vspec  architecture  is  a  decomposition  of  a  system  into  a 
collection  of  interconnected  components  where  the  require¬ 
ments  of  each  component  are  known  but  an  implementa¬ 
tion  has  not  yet  been  defined.  A  vspec  architecture  rep¬ 
resents  a  decompsition  step  in  a  top  down  design  process. 
Bisimulation  relationships  define  when  a  vspec  architec¬ 
ture  exhibits  behavioral  equivalence  with  its  associated  re¬ 
quirements;  e.g.  when  they  look  the  same  at  their  interfaces. 
Thus,  using  the  axiomatic  semantics  of  vspec  with  its  pro¬ 
cess  algebra  control  semantics  a  structural  facet  (the  vspe  c 
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architecture)  can  be  related  with  a  requirements  facet  (the 
component  specification). 

use  work. searchjtypes; 
entity  sort  is  port 

(list_in:  in  array  of  E; 
list_out:  out  array  of  E) ; 
sensitive  to  list_jLn' event; 
requires  true; 
ensures 

ordered (list_out'post)  and 
perumuation  (list_in,  list_out'post ) ; 
end  bin_search; 

use  work . se archetypes; 
entity  bin_search  is  port 

(list_in:  in  array  of  E;  M 

k :  in  K;  e :  out  E) ;  4$$-}.' 

sensitive  to 

list_in' event  or  k' event; 
requires  ordered  (list__in) ; 
ensures  V  f:  E 

output 'post'  =  f  iff 
key (f )  =k 
and  f  €  input; 
end  b in_sear ch; 


Figure  5.  Component  specifications  for  can¬ 
didate  search  architecture. 


2.4  Performance  Constraints 

Performance  requirements  are  modeled  using  relations 
defined  in  the  constrained  by  clause.  These  relations 
define  constraints  over  a  collection  of  variables  defining 
constraint  types.  The  component  is  required  to  meet  those 
constraints  at  all  times  in  every  state.  Thus,  constraints 
behave  much  like  invariants  with  respect  to  the  axiomatic 
functional  requirements. 

The  semantics  of  constraints  can  he  defined  in  terms  of  a. £ 
component’s  state.  Simply,  the  constraint  predicate  rtiust  he 
true  for  all  states: 


Vs  :  5  •  C(s) 


(9) 


The  constrained  by  clause  from  the;search  ex¬ 
ample  (Hgfrre  1)  defines  the  following  constraint  predicate: 


Vs:  S  -  area  <=  *  5itm)  i  (10) 

.  Apower  <=  10mW®  (11) 

Aclock frequency  <=  50 M.0 


In  vspec,  physical  types  behave  like  vhdl  physical 
types.  Thus,  these  relations  define  constraints  on  area, 
power  consumption  and  clock  speed. 

Constraints  present  special  problems  when  interacting 
with  other  facets.  Requirements,  behavior  and  structural 
facets  interact  in  relatively  intuitive  ways.  Providing  proper 
semantics  defines  clean  relationships  between  facets.  Con¬ 
straints  do  not  share  this  characteristic.  Constraint  variables 
(e.g.  heat  and  area)  have  no  analog  in  any  other  facet  Fur¬ 
ther,  it  is  difficult  if  not  impossible  to  model  the  relationship 
between  a  functional  requirement  and  a  constraint  Con¬ 
straints  have  neither  a  simulation  or  true  axiomatic  semantic 
making  relationships  difficult  to  define. 

3  VHDL,  VSPEC  and  Systems  Level  Design 

vdiil  provides  two  facets  for  modeling  digital 
systems:  (i)  behavioral;  and  (ii)  structural.  The 
ent  ity/archif  eicture  pair  structure  provides  means 
for  associating  multiple  facets  to  the  same  interface.  How¬ 
ever,  vhdl  provides  only  a  simulation  semantics  for  rep¬ 
resenting  systems.  This  limits  vhdl’s  impact  in  multi- 
facetted  modeling  at  the  systems  level. 

vspec  adds  new  facets  and  new  modeling  paradigms  for 
those  facets.  The  initial  objectives  for  designing  vspec 
were  to  support  very  high  level  synthesis.  Specifically,  ca¬ 
pabilities  for  specifying:  (i)  declarative  functional  require¬ 
ments;  and  (ii)  performance  constraints  were  initially  de¬ 
veloped.  Activiation  conditions  and  architectures  followed 
|;as  the  need  to  represent  component  decomposition  became 
apparent. 

vspec  demonstrates  the  effectiveness  of  multi-facetted 
modeling.  By  adding  modeling  capabilities  that  do  not  re¬ 
quire  operational  semantics,  support  is  provided  for  mod¬ 
eling  requirements  dedaratively.  Because  requirements  de¬ 
fine  “what”  rather  than  “how”,  declarative  semantics  make 
sense  for  requirements  modeling.  Further,  a  declarative  se¬ 
mantics  extended  to  both  performance  constraints  and  func¬ 
tion. 

Looking  bade  at  systems  level  language  requirements 
and  examining  vspec  and  vhdl  reveals  that  several  sys¬ 
tems  level  modeling  requirements  are  met.  Both  vhdl  and 
vspec  provide  support  for  multiple  modeling  paradigms 
for  different  component  facets,  vhdl  provides  behav¬ 
ioral  and  structural  support  using  operational  semantics. 
vspec  provides  fucntional  requirements  and  performance 
constraint  support  using  a  declarative  semantics,  vhdl  and 
vspec  also  support  modeling  different  components  in  the 
same  system  using  different  computational  models.  The  se¬ 
mantics  for  achieving  this  are  still  somewhat  arcane,  but 
they  do  exist  and  are  usable.  Finally,  a  limited  means  for 
moving  information  between  facets  exists,  vhdl  uses  a  sin¬ 
gle,  common  operational  semantics  while  vspec  uses  a  sin- 
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gle,  common  declarative  semantics.  Bisimulation  provides 
a  link  between  vhdl  and  the  functional  aspects  of  vspec. 
Links  to  and  from  the  constraint  aspects  of  vspec  are  not 
as  well  defined. 

4  Moving  VHDL  to  Systems  Level  Design 

Moving  vhdl  to  the  systems  level  involves  taking  the 
concepts  demonstrated  in  vspec  and:  (i)  extending  them 
to  the  general  case;  and  (ii)  providing  consistent  language 
support.  This  section  describes  one  possible  method  to  ac¬ 
complish  these  goals.  This  description  represents  initial 
thoughts  on  this  topic:  none  of  the  vhdl  extensions  de¬ 
scribed  in  this  section  have  been  implemented. 

Extending  the  facet  concept  to  more  general  cases  means 
providing  a  general  structure  for  supporting  facets,  vspec 
currently  annotates  the  entity  description  because  it  de¬ 
fines  interface  requirements.  Thus,  the  interface  is  the  most 
logical  place  for  the  desaption.  The  component  interface  is 
not  the  best  place  for  describing  all  requirements. 

vhdl  does  provide  a  structure  useful  for  as¬ 
signing  multiple  models  to  a  component.  The 
entity/architecture  model  allows  multiple  models 
to  be  defined  for  a  given  interface.  To  extend  this,  language 
support  must  be  provided  for  different  facets  of  an  entity. 
Specifically,  the  architecture  is  replaced  by  a  facet 
structure  that  saves  a  similar,  more  general  purpose. 
Figure  6  shows  several  facets  defined  for  a  single  entity. 

Each  facet  defined  in  Figure  6  uses  its  own  computa¬ 
tional  model.  The  model  is  selected  based  on  appropri¬ 
ateness  for  the  information  being  represented  and  language 
constructs  are  provided  appropriately.  As  new  facets  are 
identified,  new  facets  are  added  to:  the  systems  level  lan¬ 
guage  using  this  common  syntactic  support  The  hetero¬ 
geneous  nature  of  facets  makes  them  substantially  different 
than  vhdl  architectures  all  of  which  share  a  common  sim¬ 
ulation  based  semantics.  ,;f 

5  A  Potential  Sem^tiiiBasiis  JPw:' 

The  need  to  move  information  between  &cefs  is  what 
defines  systems  engineering  activities.  How  those  hetero¬ 
geneous  models  interacts  profoundly  influences  work:  at 
the  systems  level.  ^Thus,  it  is  important  to  begin  model¬ 
ing  the  interactioafof  facets.  The  syntax  described  in  the 
previous  sectioft  that  extends  vhdi.  to  the  systems  level 
is  rather  standard.  However,  mixing  computational  mod¬ 
els  within  the  same  language  environment  presents  interest¬ 
ing  research  challenges.  Reconciling  information  between 
computational  models  may  be  the  most  difficult  of  these 
challenges. 

The  approach  takenby  both  vhdl  and  vspec  is  to  de¬ 
fine  a  common  semantic  basis  for  all  language  constructs. 


vhdl  provides  a  simulation  semantics  for  each  system 
component,  vspec  provides  a  declarative,  axiomatic  se¬ 
mantics  for  each  construct.  However,  problems  tend  to 
arise  when  migrating  information  between  the  two  com¬ 
putational  models.  Modeling  inherently  operational  infor¬ 
mation  declaratively  (or  vice-versa)  simply  serves  to  over 
complicate  the  entire  model.  |§! 

Forcing  all  system  component  and  facet  representations 
into  a  single  semantic  model  may  cause  designers  to  sac¬ 
rifice  useful  design  abstractions.  For  example,  the  abstrac¬ 
tions  used  to  model  discrete  time  systems  must  be  sacrificed 
if  analog  time  is  the  basic  underlying  semantics.  The  same 
holds  true  for  any  single  underlying  semantic. 

The  solution  to  this  problem  is  modeling  how  facets  in¬ 
teract  without  resorting  to  a  single  model.  Information 
should  be  visible  among  facets  when  and  where  appropri¬ 
ate.  It  should  remain  in  the  facet  where  it  is  modeled  and 
be  moved  directly  into  the  interacting  facet  without  moving 
through  a  universal  representation.  Figure  7  shows  graphi¬ 
cally  the  information  flow  into  a  unified  representation  ver¬ 
sus  information  flow  between  facets. 

System  facets  and  their  interactions  can  be  viewed  theo¬ 
retically  as  institutions  [6].  Although  it  is  not  proposed  that 
facets  be  implemented  as  institutions,  using  this  abstraction 
potentially  aids  understanding  and  modeling  information. 

Each  system  facet  is  a  category  where:  (i)  objects  are 
model  instances  in  that  facet;  and  (ii)  arrows  are  homomor- 
phisms  betweem  model  instances.  To  satisfy  these  require¬ 
ments,  a  facet  must  be  a  formal  system  consisting  of  a  lan¬ 
guage,  formal  semantics  and  inference  system  Homomor¬ 
phism  is  classically  defined  as  theory  containment.  Specif¬ 
ically,  if  a  homomorphism  exists  between  two  objects,  then 
the  first  is  contained  in  the  theory  of  the  second.  These  char¬ 
acteristics  result  trivally  from  category  theory  definitions. 

A  category  [15]  C  is  defined  as: 

1.  A  collection  of  objects 

2.  A  collection  of  morphisms  (represented  by  arrows) 

3.  Operations  or  declarations  assigning  each  arrow  /  a 
domain  object,  d,  and  co-domain  object  c.  Given  f  : 
a  ->•  b  specifies  arrow  /,  dom  f  =  a  and  cod  /  =  b. 

4.  A  composition  operator  (o)  assigning  to  each  pair  of 
arrows  /  and  g  such  that  cod  /  =  dom  g  a  composite 
arrow  gof :  dom  f  -¥  cod  g  stastifying  the  associative 
law: 

ho(gof)  =  (hog)of 

5.  An  identity  arrow  id  a  :  A  — >■  A  satisfying  the  identity 
law: 

id  a  o  /  =  /  A  /  o  id A  =  / 
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—  The  basic  component  interface 

—  remains  the  same 
entity  search  is  port 

(input:  in  array  of  E; 
k:  in  K; 
output:  out  E); 
end  search; 

—  A  requirements  facet  containing  an 

—  axiomatic  specification 

facet  requirements  of  component  is  axiomatic 
begin 

includes  KeyToElement (E, K) ; 
modifies  output; 
sensitive  to 

k' event  or  input' event 
requires  true; 

ensures  V  e:  E  J|| 

output 'post  =  e  iff 
key(e)=k 
and  e  6  input; 
end  requirements; 

—  A  requirements  facet  containing  a 

—  performance  constraint  specification 
facet  constraints  of  component  is  peformance 
begin 

size  <  3  um  *  5  urn;  ’ 

power  <  10  mW;  &  '  SH 

clock  <  50  MHz ; 
end  constraints; 


—  A  behavioral  facet  containing  behavioral 

—  VHp 

face:|::  function  of  component  is  behavioral 
variable  i :  integer; 
begin  ':.'i 

for  i  in  input' range  loop 
;.:?::'end  loop; 

:  end  f unc t i  on ;  .  ;N 

:Ah  architecture  facet  bbntaining  structural 
;;:'&VHDL  jp' 

facet  architecture  of  component  is  structural 
component  sort 

pdrti.ijlist-in:  in  array  of  E; 

list-out:  out  array  of  E)  ; 
endpcomponent; 
component  bin_search 
'll^-xliport  (list_in:  in  array  of  E; 
k:  in  K; 
out-E); 

end  comppnent; 
signal’  x-i;|viarray  of  E; 
begin 

Cl:  sort  pcpt  map  (input, x); 

C2:  bin-seairch  port  map  (x,k,e); 
end  architecture ; 


Figure  6.  Some  potential  facets  defined  using  a  VHDL-like  systems  representation. 


Homomorphisms  between  objects  within  a  category  rep¬ 
resent  changes  to  design  instances  where  correctness  is 
maintained.  Relationships  between  inform  domains 
can  be  presented  by  treating  ^  domains  as  cat¬ 

egories  and  interrelationships  as  functors.  . 

An  institution  [6]  X  is  defined  as: 

1.  A  category  Sign  of  signatures. 

2.  A  functor  Sen  :  Sign  -+  Set  giving  the  set  of  sentences 
over  a  given  signature. 

3.  A  functor  Mod  :  Sign  Catop  giving  the  variety of- 
models  of  a  given  signature 

4.  A  satisfaction  illation  |=;C  Mod{ E)  x  Sen( S)  for 
each S  in! 


S  #:  S/  in  Sign,  the 


Such  that  for  every  morphism  ip 
satisfactionitbndition: 


Mm.  m'  N  ^(e)  ^  ^P{mf)  f=  e  ||| 

holds  for  fea^ m'  ;in:Mod(E' )  and  each  e  in  Sen(S) 

What  the  institution  defines  is  a  link  between  theorems 
in  one  category  with ^  theorems  in  another.  The  institution 


enforces  a  condition  that  links  facets  and  forces  information 
between  them  to  remain  consistent.  Thus,  if  a  theorem  in 
one  facet  changes,  appropriate  theorems  in  a  linked  facet 
mustchange  to  keep  information  consistent  between  facets. 


Instititions  provide  the  necessary  formal  framework  for 
a  semantic  definition.  The  various  facets  must  be  modeled 
formally  as  well  as  the  functors  regulating  interactions.  Fur¬ 
ther,  institutions  will  not  form  the  basis  of  an  efficient  im¬ 
plementation.  Thus,  language  structures  must  be  provided 
that  link  facets.  These  language  structures  can  use  the  insti¬ 
tution  as  their  formal  basis  while  providing  a  more  efficient 
link  between  two  different  facets.  For  example,  an  institu¬ 
tion  would  be  developed  that  links  a  requirements  facet  to 
a  structural  facet  for  a  given  entity.  One  possible  basis  of 
this  institution  could  be  the  bisimulation  condition  defined 
between  vspec  requirements  and  vspec  architectures.  Ob¬ 
viously,  there  is  still  much  work  to  be  done  before  these 
concepts  can  be  used  to  formally  describe  the  relationship 
between  two  different  facets  of  a  component.  However,  the 
institution  model  appears  to  be  a  promising  approach. 
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Requirements 

Facet 


Behavioral  Thermal 

Facet  Facet 

\  / 

Unified 

Representation"* 

/ 


Structural 

Facet 


Power 

Facet 


Constraints 

Facet 


Behavioral 


Thermal 


Constrai 

Facet 


Figure  7.  Information  flow  into  a  universal  representation  vs.  direct  flow  between  facets. 


6.  Related  Work 

Odyssey  Research  Associates  (ORA)  is  developing 
Larch/VHDL,  an  alternative  Larch  interface  language  for 
vhdl  [11].  Larch/VHDL  is  targeted  for  formal  analysis 
of  a  vhdl  description  and  ORA  is  defining  a  formal  se¬ 
mantics  for  vhdl  using  lsl.  The  lsl  representations  are 
used  in  a  traditional  theorem  prover  (Penelope,  developed 
for  a  similar  annotation  language  for  Ada  [7])  to  verify  sys¬ 
tem  correctness.  Larch/VHDL  annotations  are  added  to  a 
specific  vhdl  description  to  represent  proof  obligations  for 
the  verification  process.  This  differs  from  vspec’s  purpose 
of  representing  requirements  and  design  decisions  at  high 
levels  of  abstraction.  Further,  Larch/VHDL  provides  only 
a  declarative  representation  of  die  operational  vhdl  se¬ 
mantics.  However,  the  interface  language  defined  by  ORA 
does  provide  a  means  for  defining  requirements  much  like 
vspec’s  axiomatic  component. 

Augustin  and  Luckham’s  val  [2]  is  another  attempt  to 
annotate  vhdl  for  requirements  modeling.  The  purpose 
of  a  val  annotation  to  a  vhdl  description  is  to  document 
the  design  for  verification,  val  provides  fiiechanisms  for 
mapping  a  behavioral  description  to  a  structural  description. 


into  a  self-checking  vhdl  program  that  is  simulated  to  ver¬ 
ify  that  the  two  descriptions  implement  the  sarnie  function. 
This  is  once  again  slightly  different  than  vspec’s  purpose 
of  high  level  requirements  representation.  Further,  val’s 
semantics  is  operational  in  that  it  can  be  trasformed  into 
vhdl  assertions. 

The  abstract  architecture  representation  capabilities  of 
vspec  are  also  fairly  <abseiy  ielated  tp  several  architecture 
description  languages  that  have  been  developed  to  describe 
software  architectures  [5].  Some  of  the  more  well  known 
architecture  description  are  UniCon  [16],  Wright  [1]  and 
Rapide  III,  13],  Each  of  these  languages  allow  the  def¬ 
inition  of  components  and  connectors  to  define  a  software 
architecture.  This  is  very  similar  to  the  vhdl  notion  of  a 
structural  architectural  ^ 

Allen  and  Garlan’s  WRlGllT  language  is  of  particular  in¬ 


terest  when  discussing  vspec  because  a  Wright  compo¬ 
nent  is  defined  with  a  variant  of  csp.  Unlike  vspec’s  use 
ofcsp  to  define  component  synchronization,  Wright  uses 
csp  to  define  component  behavior  as  well.  A  Wright  de¬ 
scription  consists  of  a  collection  of  components  interacting 
via  instances  of  connector  types.  Wright’s  csp  descrip¬ 
tions  define  the  sequence  of  events  a  component  or  connec¬ 
tor  participates  in. 

7  Conclusions 

This  paper  presented  preliminary  thoughts  on  the  exten¬ 
sion  of  vhdl  to  the  systems  level.  First,  the  systems  level 
design  problem  was  defined  as  sharing  information  between 
system  facets,  vhdl  supports  limited  multi-facet  model¬ 
ing,  but  does  not  provide  sufficient  flexibility  to  be  called 
a  systems-level  design  language.  Second,  vspec  was  pre¬ 
sented  as  a  first  step  towards  systems  level  design,  vspec 
adds  information  facets  to  vhdl  that  support  modeling  re¬ 
quirements,  performance  constraints  and  abstract  architec¬ 
tures.  Finally,  initial  syntactic  and  semantic  extensions  to 
vhdl  were  presented  that  add  a  facet  construct  to  the 
language  and  model  its  semantics  using  institutions.  We  be¬ 
lieve  these  extensions  would  be  a  first  step  towards  making 
vhdl  more  suitable  for  systems  level  design. 
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Abstract 

Evaluating  architectural  design  decisions  early  in  the 
design  process  is  critical  for  cost  effective  design.  For¬ 
mal  analysis  can  provide  such  evaluation  if  architec¬ 
tures  are  defined  in  a  formal  way.  This  paper  describes 
how  vspec  can  be  used  to  formally  define  an  archi¬ 
tecture  during  requirements  specification.  VSPEC  is  a 
Larch  interface  language  for  vhdl  that  annotates  VHDL 
entities  using  the  axiomatic  style  provided  by  Larch  in¬ 
terface  languages.  Using  vhdl ’s  structural  definition 
support ,  entities  described  in  this  manner  are  connected 
to  form  architectural  descriptions.  Activation  condi¬ 
tions  over  component  inputs  define  when  that  compo¬ 
nent  must  perform  its  transform.  In  this  paper,  we 
formally  define  a  vspec  component9 s  state  and  how 
component  states  interact  in  an  architecture.  A  rudi¬ 
mentary  formal  semantics  for,  ;cp7nfiqn^fit  a  diva  tion  is 
presented  and  used  to  define  two  potential  satisfaction 
criterion. 


1.  Introduction  I 

Architectural  design  decisions  made  early  in  a  sys¬ 
tem’s  design  profoundly  affect  overall  design  quality. 
Unfortunately,  architecture:  decisions  are  rarely  evalu¬ 
ated  until  late  in  the  design  process.  Simulation-based 
design  languages  such  as  vhdl  [17]  do  not  allow  evalua¬ 
tion  until  corpiete  models  exist.  Such  models  include 
not  only  architectural  decisions,  but::a2so  component 
design  decisions.  For  large  systems,  siriuilatable  mod¬ 
els  appear;|ate  in  the  design  driving  up  the  cost  of  error 

*  Suppdrt  for  this  work  was  provided  in  part  by  the  Advanced 
Research  Projects  Agency  and  monitored  by  Wright  Labs  under 
the  RASSP  Techn61ogyi::Progr2un,  contract  number  F33615-93- 
C-1316. 


correction. 

A  solution  to  late  architecture  evaluation  is  formal 
analysis  of  abstract  architectures  at  the  requirements 
level.  An  abstract  architecture  is  an  inter-connected 
collection  of  components  where  the  requirements  of 
each  component  are  Specified  without  defining  their  im¬ 
plementation.  Thus,  an  abstract  architecture  describes 
•  a  class  of  solutions  rather  than  a  single  instance.  In¬ 
stead  of  waiting  for  a  completed  system  including  de¬ 
sign  detail,  formally  described  abstract  architectures 
can  be  evaluated  when  architecture  decisions  are  made. 
•iVSPEC  [3],  a  Larch  interface  language  [8]  for  vhdl  [17], 

:  is  a  requirements  description  language  that  includes 
Informal  architecture  definition  support. 

VSPEC  describes  the  requirements  of  digital  system 
components  using  the  canonical  Larch  approach  and 
interconnects  component  descriptions  using  vhdl’s 
structural  definition  features.  Each  VHDL  entity  is 
annotated  with  a  pre-  and  post-condition  to  indi- 
||cpe  the  component’s  functional  requirements.  VSPEC- 
Ipmotated  entities  are  connected  together  using  a  vhdl 
structural  architecture  to  form  an  abstract  architec¬ 
ture.  The  vhdl  architecture  indicates  interconnec¬ 
tion  in  the  traditional  manner,  but  the  requirements 
of  each  component  are  defined  instead  of  their  imple¬ 
mentations.  An  activation  condition  can  be  defined 
to  explicitly  indicate  when  a  component  should  exe¬ 
cute.  Finally,  VSPEC  allows  a  designer  to  describe  non¬ 
functional  requirements  critical  in  selecting  from  alter¬ 
native  architecture  implementations. 

This  paper  describes  vspec,  concentrating  on  the 
language’s  facilities  for  describing  abstract  architec¬ 
tures.  Section  2  provides  a  brief  summary  of  the  VSPEC 
language.  Section  3  describes  vspec  abstract  architec¬ 
tures,  including  a  definition  of  the  vspec  state  model 
and  a  description  of  how  a  process  algebra  (csp  [9])  is 
used  to  provide  a  semantics  for  the  vspec  activation 
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condition.  Section  4  discusses  how  these  semantics  can 
be  used  to  verify  that  an  abstract  architecture  satisfies 
the  specification  of  the  entity.  The  paper  concludes 
with  a  discussion  of  related  work. 

2.  A  Brief  Summary  of  vspec 

VSPEC  is  a  requirements  specification  language  for 
digital  systems.  As  a  requirements  specification  lan¬ 
guage,  it  is  used  very  early  in  the  design  process  to 
describe  “what”  a  digital  system  must  do.  The  oper¬ 
ational  style  of  VHDL  makes  VHDL  alone  ill-suited  for 
requirements  specification.  It  forces  a  designer  to  de¬ 
scribe  a  system  by  defining  a  specific  design  artifact; 
that  describes  “how”  the  system  behaves.  Using  VH$i| 
as  a  requirements  specification  language  forces  a  de¬ 
signer  to  deal  with  unnecessary  detail  at  an  early  point 
in  the  design  process. 

In  contrast  to  VHDL’s  operational  style,  VSPEC  al¬ 
lows  a  designer  to  declaratively  describe  a  component. 
A  vspec  description  of  a  sorting  component  is  shown 
in  Figure  1.  As  with  most  other  Larch  interface  lan¬ 
guages,  the  requires  and  ensures  clauses  a^  iused; 
to  state  the  pre-  and  post-conditions  of  the  compo¬ 
nent.  The  sort  component  does  has  a  pre-condiHon 
of  true  which  means  it  will  function  correctly  for  any 
set  of  inputs.  The  post-condition  states  that  the  out¬ 
put  contains  all  the  same  elements  as  the  input  (i’e. 
permut  at  ion  (output 7  post,  input))  and  the  output 
is  in  order.  Any  implementation  of  a  sorting  compo¬ 
nent  that  makes  this  post-condition  true  in  the  next 
state  is  a  valid  implementation  of  these  requirements. 
More  generatlly,  given  a  compbriehi;  with  requires 
predicate  I(St)  and  ensures  predicate  0(St,  St' post), 
f{St)  is  an  implementation^  the  requirements  if  the 
following  condition  holds:.  >' 


Vs‘I{St)^.0{StJ(St))i 


*Ti) 


In  addition  to  allowing  a  designer  to  describe  “what” 
a  component  does,  vspec  also  addresses  another  short¬ 
coming  of  VHDL:  it  allows  a  designer  to  specify  perfor¬ 
mance  constraints  in  a  consistent  fashion.  The  VSPEC 
constrained  by  clause- is  used  for  this  purpose.  As 
shown  in  Figure  1,  this  clause  defines  relations  over 
constraint  variables.  Currently,  thegdefined  constraint 
variables  include  power  consumption,  layout  area  (ex¬ 
pressed  aspt  bounding  box),  heat  dissipation,  clock 
speed  and  pin  to  pin  timing.  Constraint  theories  writ¬ 
ten  in  L||iid|fine  each  constraint  type.  Users  may  define 
their  own  constraints  and  theories  if  desired. 

The  state  clause  contains  a  list  of  variable  decla¬ 
rations  that  define  the  internal  state  of  a  component. 


These  variables  maintain  state  information  that  may 
not  be  recorded  by  the  values  of  the  component’s  ports. 
A  state  elapse  is  not  needed  in  the  sorting  component 
specification  in  Figure  1,  but  an  example  of  this  clause 
can  be  found  in  the  Move  Machine  description  [3], 

The  modifies  clause  lists  variables,  ports  and  sig¬ 
nals  whose  values  may  be  changed  by  the  entity.  Most 
other gLarch  interface  languages  contain  a  modifies 
clause,  and  the  definition  of  VSPEC  modifies  clause 
is  very  similar  to  the  definitions  found  in  these  lan¬ 
guages  [4,  7,  12].  The  includes  clause  is  used  to  in¬ 
clude  Larch  Shared  Language  definitions  in  a  vspec 
description.  gThe  sorts  and  operators  defined  in  the 
lsl  trait  named  by  the  includes  clause  can  be  used 
in  the  VSPEC  definition.  In  this  example,  the  SortOps 
trait  defines  two  predicates:  permutation  and  sorted. 

The  sensitive  to  clause  plays  the  same  role  in  a 
VSPEC:  definition  that  sensitivity  lists  and  wait  state¬ 
ments  play  in  a  VHDL  description.  It  defines  when 
a  component  is  active.  The  sensitive  to  clause 
for  sort  in  Figure  1  states  that  the  entity  activates 
(and  sorts  its  input)  whenever  the  input  changes. 
The  sensitive  to  clause  contains  a  predicate  indi¬ 
cating;  whe  an  entity  should  begin  executing.  The 
next  section  contains  a  more  precise  semantics  for  the 
sensitive  to  predicate. 

3.  Abstract  Architectures 

!|  VHDL  structural  architectures  composed  of  VSPEC 
Annotated  components  specify  abstract  architectures. 


ThegigHDL  architecture  remains  unchanged  indicat¬ 
ing  component  instantiation  and  connections.  How¬ 
ever'  the  configuration  does  not  assign  an  en¬ 
tity/ architecture  pair  to  each  component  instance  in 
||pe  architecture.  Instead,  the  configuration  states  that 
seach  component  references  an  entity  with  an  architec¬ 
ture  called  VSPEC.  This  signifies  that  at  the  current 
point  in  the  design,  the  requirements  of  this  component 
are  known  (via  the  vspec  description)  but  no  imple¬ 
mentation  has  been  defined. 

Consider  the  VSPEC  description  of  a  find  compo¬ 
nent  shown  in  Figure  2a.  The  output  of  find  is  the 
element  from  the  input  array  with  the  same  key  as 
the  k  input.  This  requirement  is  represented  by  find’s 
ensures  clause.  One  possible  way  to  meet  this  require¬ 
ment  is  to  connect  the  output  of  a  sorting  component  to 
a  binary  search  component  as  shown  in  Figure  3.  The 
specification  for  sort  is  the  same  as  the  one  in  Sec¬ 
tion  2  while  the  bin__search  specification  is  shown  in 
Figure  2b.  The  only  difference  between  this  structural 
description  of  find  and  a  VHDL  structural  description 
of  find  is  the  configuration  specifies  that  the  vspec 
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entity  sort  is  port 

(input:  in  integer. array; 
output :  out  integer. array) ; 
includes  SortOps; 
modifies  output; 
sensitive  to  input1 event; 
requires  true; 
ensures 

pe rmut at i on ( output  *  post,  input)  and 
sorted (output’post) ; 
constrained  by 

power  <=  5  mW  and  size  <=  3  um  *  5  um 
and  heat  <=  10  mW  and  clock  <=  50  MHz 
and  input<-> output  <=  5  Ms; 
end  sort; 


H 

m 

soft 


input 


Vcc  Clk 


output 


Time 


Figure  1.  VSPEC  description  of  a  sorting  component. 


descriptions  of  sort  and  bin_search  should  be  used 
instead  of  a  specific  architecture  for  these  two  entities. 
This  configuration  describes  an  abstract  architecture 
for  the  find  component.  Any  implementation  satisfy¬ 
ing  the  VSPEC  requirements  of  sort  and  bin-search 
may  be  associated  with  these  entity  definitions.  The 
abstract  architecture  for  find  defines  a  class  of  solu¬ 
tions  with  a  common  structure. 

Although  a  vhdl  architecture  referencing  vspec 
definitions  defines  components  and  interconnections, 
additional  information  must  be  added  to  specify  when 
the  vspec  components  activate.  In  traditional  sequen-:  :| 
tial  programming,  a  language  construct  “executes”  fol¬ 
lowing  termination  of  the  construct  preceding  it.  For 
correct  execution,  a  construct’s  pre-condition  must  be 
satisfied  when  the  preceding  construct  terminates.  In 
hardware  systems,  components  exist  simultaneously 
and  behave  as  independenflprocesses;  :^  predefined 
execution  order  exists  so  there  is  no  means  of  implicitly  i 
determining  when  a  component’s  pre-condition  should 
hold.  ’  ’ 

VHDL  provides  sensitivity  lists  and  wait  statements 
to  synchronize  entity  execution  and  define  when  a  com¬ 
ponent  in  a  structural  architecture  is  active,  vspec 
achieves  the  same  end  using  the  sensitive  to-claused 
The  sensitive  to  clause  contains  a  predicate  called 
the  activation  condition  tliaf  indicates  when  an  entity 
should  begin  executing.  Effectively,  this  activation  Con¬ 
dition  defines;phen  a  vspec  annotated  entity’s  precon¬ 
dition  mustlhold.  When  the  sensitive  to  predicate 
is  true,  thC  pre-condition  must  hold  Md  the  imple¬ 
mentation  must  satisfy  the  post-conditiom  When  the 
sensitive  to  predicate  is  false,  the  entity  makes  no 
contribution  to  the  state  of  the  system.:  din  the  find 
example,  both  components  activate  when  any  of  their 
input  signals  change.  • 


Formally,  the  contribution  of  the  sensitive  to 
clause  to  the  transformation  specified  by  vspec  is  eas¬ 
ily  represented  using  a  traditional  process  algebra  such 
as  csp  [9].  Components  become  processes  and  events 
are  defined  as  the  states  a  component  enters.  Thus, 
any  vspec  component  can  be  described  by  a  process 
that  consumes  states  and  generates  a  process  in  a  new 
state.  To  define  such  state  changes,  a  component  state 
is  defined  along  with  a  means  for  combining  component 
states  into  an  architecture  state. 

|  The  formal  vspec  model  of  the  state  of  a  component 
is  based  on  Chalin’s  state  model  [4,  Chapter  6]  for  lcl. 
This  model  partitions  the  computational  state  of  an 
l£l  description  into  an  environment  and  a  store  [19]. 
The  environment  maps  (variable)  identifiers  into  ob¬ 
jects  and  the  store  binds  objects  to  the  values  they 
contain: 


Env  == 
Store  ===== 


Id  -4  Obj 
Obj  -**>  Value 


(2) 

(3) 


Separating  the  environment  and  the  store  in  this 
fashion  is  common  among  formal  models  of  program 
state.  In  a  language  such  as  LCL,  a  motivating  fac¬ 
tor  for  this  is  to  allow  multiple  names  for  the  same 
element  of  memory.  For  example,  two  C  pointers  can 
obviously  reference  the  same  memory  location.  The 
program  state  model  above  represents  this  situation 
by  mapping  each  of  these  pointers  to  the  same  object 
in  the  Env  map. 

This  partitioning  of  component  state  is  used  in  the 
vspec  state  model.  In  addition  to  allowing  the  correct 
representation  of  vhdl  access  types,  this  partitioning 
also  allows  the  state  of  an  abstract  architecture  to  be 
more  easily  represented.  For  a  single  vsPEC-specified 
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entity  find  is  port 

(input:  in  element, array; 
k:  in  keytype; 
output :  out  element) ; 
includes  Element (element , keytype 
element, array) ; 

modifies  output; 
sensitive  to 

input* event  or  k* event; 
requires  true; 

ensures  forall  (e  :  element) 
(output  =  e  implies 
(e.key  =  k 

and  elem_of (e , input) ) ) ; 
constrained  by 
power  <=  5  mW 
and  size  <=  3  um  *  5  urn 
and  k<-> output  <=  5  Ms 
and  heat  <=  10  mW 
and  clock  <-  50  MHz; 
end  find; 


entity  bin_search  is 

port  (input : ^buffer  element, array; 
k :  in  int  e ge r ;  - 
value1:  out  element); 
modifies  value; 
sensitive  to  4- 

input 'event  or  k* event; 
requires  sorted (input) ; 
ensures  output  =  e  iff  (e.key=k  and 
e lement jo&(e , input) ) ; 
constrained  by 
,j:|pwer  <=  1  mW  tjSp*’ 
size  <=  1  um  $|2  um; 
end  bin,search ; 


Figure  2.  vspec  descriptions  of  find  and  binary  seaf^ildinponents. 


component,  Env  contains  a  map  from  each  port  and 
state  variable  in  the  VSPEC  description  to  an  object. 
Store  maps  each  of  these  objects  to  their  current  value. 
We  call  this  the  abstract  state  of  the  VSPEC  component. 

When  VSPEC  components  are  connected  together  to 
form  an  abstract  architecture,  the  elements  of  Env  £iid 
Store  are  slightly  different.  The  Store  contains  objects 
for  each  port  in  the  architecture’s  entity,  for  each  sig¬ 
nal  in  the  architecture  and  for  the  state  variables  of 
each  component  in  the  architecture.  The  Env  maps 
each  of  these  three  types  of  elemgnt||p>  the  proper  ob¬ 
ject,  but  it  also  maps  the  ports  -of  each  architecture 
component  to  the  object  that  represents  the  architec¬ 
ture  signal  the  port  is  connected  to.  We?||ll  the  state, 
model  of  an  abstract  architecture  the  cofiMet^ 
the  component.  ‘ ’• " 

In  the  simple  two  ;Cpmpbhent  example  of  Figure  4, 
the  abstract  state  of  system,  A  ariid  B  are: 

{sysJn  i-*  objsys_in, 

oMsys-out}  W 

{objSyS_in 

Vsys^in  ? 

objsys _ out  ^ 

{x  objx ,  y  ^  objy] 

{objx  i->  vx,  objy  k*  vy} 

{w  objw ,  z  '->$bjz} 

{obj^  H-  vw,  objgi-*  vz} 


The  concrete  state  of  the  struct  architecture  is: 

Ehvstractsy9urn  =  {sysJ,n  H-  objsys_in , 

Sy$ — OUt  •  ^  0 bjgyg _ outi 

C  0  bjc ,  X  objsys^jn  5 

y  objC}w  H-  objc, 
z  ^  °bjsys—out} 

St0restruct,yStem  —  {objsys—in  ^  ^sys_in ^ 

Objsys_out  ^  Vsys^OUtl  ^bjc  Vc} 

Notice  that  x,  y,  w  and  z  now  map  to  the  objects 
£pntaining  the  signal  values  the  component  ports  are 
||6nnected  to. 

The  semantics  of  a  vspec  entity  are  defined  by  a  CSP 
process  that  defines  the  sequence  of  states  the  entity 
passes  through.  Let  C  be  an  entity  with  sensitive 
to,  requires  and  ensures  predicates  S(St ),  I{St)  and 
0(St,St'post),  respectively.  The  process  defining  C  in 
any  state  r  is: 

Cr  =  r  :  ^  — >  Cr'post  (4) 

where  $  =  {t :  7c|5(£)}  *s  the  set  of  states  that  satisfy 
(7’s  activation  condition  and  Px  is  the  process  P  in 
some  state  x.  0(r,  r'po$£)  must  hold  to  assure  the 
transformation’s  correctness.  Thus,  when  an  external 
force  changes  the  abstract  state  to  one  that  satisfies 
the  entity’s  activation  condition  (r  in  Equation  4),  the 
process  will  consume  r  and  behave  like  Cr>post.  A  trace 
of  the  process  defined  by  a  vspec  entity  is  a  sequence 


EnvSyStem 

Store  SySieirri  = 

Jill  Euva  = 
M  StoreA  = 
..  Env#  = 
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architecture  structure  of  find  is 
component  sorter 

port  (input:  in  element^ array; 

output:  out  e lement_ array) ; 
end  component; 
component  searcher 

port  (input:  in  elementwarray; 
key:  in  integer; 
value:  out  element); 
end  component; 
signal  y:  e lament _ array; 
begin 

bl:  sorter  port  map (input ,y) ; 
b2:  searcher  port  map(y,k, output) ; 
end  structure; 


Figure  3.  A  VSPEC  abstract  architecture  representation  of  the  find  component. 

from  Equation  6.  Hoare  [9]  defines  traces  over  parallel 
composition,  traces  01  ||  C2),  as: 

aCt)  e  traces(Ci) 
p(*  \  aC2)  E  traces (C2) 

At  e  (aCx  UaC2)*} 

Thus,  the  traces  of  a  parallel  composition  of  com¬ 
ponents  are  all  traces  that  when  restricted  to  the  al¬ 
phabet  of  each  component  yield  a  trace  of  that  com¬ 
ponents  Furthermore,  traces  of  Cj  ||  C2  only  contain 
events  from  the  alphabet  of  either  components.  Thus, 
every  trace  of  A  contains  only  states  that  satisfy  the 
activation  condition  of  at  least  one  component  in  A . 
|||  If  A  enters  a  state  where  none  of  its  component’s 
activation  condition  is  true,  it  will  wait  for  a  change  on 
one  of  its  input  ports.  Sequences  in  traces(A)  con¬ 
tain  only  states  that  activate  a  component  of  A  so 
the  process  representing  A  only  consumes  those  states. 
However,  a  change  to  a  component’s  input  port  also 
causes  a  state  change  and  inactive  components  must 
wait  for  events  from  external  sources  to  initiate  acti¬ 
vation.  Traces  (A)  is  not  strictly  the  set  of  all  states  a 
component  may  enter,  but  the  set  of  all  states  a  com¬ 
ponent  enters  from  active  states. 

4.  System  Verification 


This  section  describes  how  the  CSP  semantics  of 
a  VSPEC  abstract  architecture  can  be  used  to  verify 
that  an  abstract  architecture  for  an  entity  satisfies  the 
VSPEC  specification  of  the  entity.  Many  satisfaction 


of  abstract  states  the  entity  enters.  Each  of  these  states 
satisfy  C’s  activation  condition.  Thus,  the  alphabet  of 
C  is  equal  t o  $ . 

If  f{St)  implements  the  requirements  specified  by 
I(St)  and  0(St,  St1  post)  (i.e.  f{St)  satisfies  Equa¬ 
tion  1),  Equation  4  can  be  re-written  as: 

Cr  =  r  :  ->  C/(r)  f5||: 

In  this  situation,  the  process  consumes  r  and  /  is 
applied  to  r  to  generate  a  new  abstract  state.  The 
entity  then  behaves  like  the  process  defined  by  Cf(r). 

CSP  s  concurrency  operator  combines  component 
processes  to  define  the  behavior  of  a  VSPEC  architec¬ 
ture.  Let  Ci,C2,...,Cn  be  the  processes  represented 
by  Equation  4  or  5  for  the  set  of  vspec  component 
instances  in  architecture  ^.  The  process  representing 
architecture  A  is : 


A  -  pi  II  C2  ]t  \]Cn'  (6) 

When  the  current  state  satisfies  some  component’s 
activation  condition,  the  component  performs  its  spec¬ 
ified  transformation  to  its  abstract  state.  This  change 
is  propagated  to  state'  of  the  architecture 

where  the  actiy^bn condition  of  ianother  component 
may  be  satisfied.  This  causes  the  process  to  repeat 
until  the  sgpem  changes  to  a  concretes:  state  where  no 
componen||s  activation  condition  is  satisfied.  The  sys¬ 
tem  then  waits  until  some  external  source  changes  the 
concrete  state  to  one  that  activates  some  component  in 
the  architeiituf e  tb  istart  the  process  again. 

In^the  CSP  model  bf  a  VSPEC  process, Mis  notion  can 
be  understood  by  examining  the  possible  traces  of  A 


configuration  test_vspec  of  find  is 
for  structure 

for  bl: sorter -use  entity 
work. sort (VSPEC) ; 
end  for; 

for  b2: searcher  use  entity 
work  .bin_search  (VSPEC) 
end  for 
end  for; 

end  test_struct; 


entity  A  is  port 
(x  :  in  integer; 
y  :  out  integer) ; 
requires  Ja(x)  ; 
ensures  Oa(x,  yr post) ; 
modifies  y; 
end  A; 

entity  B  is  port 
(w  :  in  integer; 
z  :  out  integer) ; 
requires  Ib{ w); 
ensures  Ob (w,  z* post) ; 
modifies  z; 
end  B; 


architecture  struct  of  system  is 
component  A 

port  (x  :  in  integer; 

y  :  out  integer) ; 
end  component; 
component  B 

port  (w  :  iSp integer; 

z  :  lout  integer) ; 
end  component ; 
signal  c;  • 

begin 

cl:  A  port  map(sys__in,c) ; 
c2:  |B  port  map(c,sys_:Out) ; 
end  struct; 


entity  system  is  port 
(sys_in  :  in  integer; 
sys_out  :  out  integer) ; 
end  system; 


Figure  4.  Example  of  two  entities  connected  serially.1 


criteria  could  be  specified  and  checked.  Here,  two  ex¬ 
amples  are  considered:  (1)  weak  bisimulatipn;  and  (2) 
trace  equivalence.  Weak  bisimulation  will  evaluate  the 
final  state  of  a  halting  system.  Trace  equivsdence  will 
look  at  traces  from  systems  that  do  not  halt.  • 

Satisfaction  criteria  will  be  evaluated  by  comparing 
the  abstract  states  from  the  problem  definition  with 
concrete  states  of  the  abstract  architecture.  To  make; 
this  comparison  possible,  an  abstraction  function  that 
maps  concrete  states  to  their  abstract  equivalent  must 
be  defined.  We  call  this  function  abs  and  note  that  a 
concrete  state  c  is  equivalent  tq^l^tract  state  a  if 
and  only  if  afcs(c)  =  a. 

The  most  traditional  cofrectness  criterion  used  to 
verify  an  abstract  architecture  implements  its  specifica¬ 
tion  is  weak  bisimulatiqh|i5] .  A  weak  bisufiulaft^ 
simply  bisimulation)  condition  holds  when  a  sequence 
of  states  in  the  concrete  model  produces  a  desired  sin¬ 
gle  state  change  specified  t>y  Ihi 
Figure  5).  Only  the  first  and  lastti|ia|| 

Crete  state  sequence  are  significant.  The  specific  state 
sequence  leading  from  the  ilnitial  concrete  state  to  the 


llltract;  model  (see 
of  the  con-; 


final  concrete  state!  is 
Equation  7  M 


ignored.  • 

a  weak  bisimulation  correctness  obli¬ 
gation  for  showing  architecture  A  satisfies  a  single  ab¬ 
stract  statepiange  specification.  Here,  *9  a  is  the  set 
of  concrete  states  where  the  activation  condition  of  at 
least  onpicqmponent  in  A  is  true.  The  obligation  states 
that  for;ipqh|tete  state  traces  starting  in  a  state  whose 
abstract  projection  satisfies  the  abstract  specification’s 
pre-condition,  either  the  abstract  projection  of  the  fi¬ 
nal  process  state  iri"- component 


Figure  5.  Concrete  state  changes  associated  with 
a  single  abstract  state  change. 


post-condition  or  the  process  can  consume  the  state 
and  continue. 


Vt  :  traces(A)  -  I(abs(r0))  A  Ajr  =  As  => 

( 0(abs(r0 ),  afrs(s))  VsG  Va) 

For  systems  with  clearly  defined  halting  or  pausing 
points,  Equation  7  is  an  appropriate  correctness  crite¬ 
rion.  However,  many  systems  run  continuously.  Their 
states  are  observable,  but  there  is  no  notion  of  pausing 
or  halting  to  synchronize  abstract  state  comparison. 
To  formulate  the  correctness  criterion  for  these  types 
of  systems,  a  concept  similar  to  bisimulation  is  applied 
to  sequences  of  states  rather  than  a  single  state  change. 

Traces  can  be  derived  from  the  abstract  require¬ 
ment  specification  by  defining  process  R  in  state  S  as 
Rs  =  S  :  ->  Rs *  in  the  same  manner  as  the  con- 
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Figure  6.  Concrete  state  changes  associated  with 
multiple  abstract  state  changes. 


Crete  requirements.  Such  traces  are  exactly  one  event 
long  when  a  single  state  change  is  defined.  However,  if 
the  resulting  state  satisfies  the  component’s  activation 
condition,  then  the  process  will  continue  to  consume 
states.  Thus,  traces(Rs)  is  the  set  of  finite  abstract 
state  sequences  defined  for  process  R .  With  this,  traces 
through  both  the  abstract  requirements  and  concrete 
specification  are  defined. 

The  image  of  a  trace  with  respect  to  an  abstraction 


for  VHDL  [10],  Larch/ VHDL  is  targeted  for  formal  anal¬ 
ysis  of  a  VHDL  description  and  ORA  is  defining  a  formal 
semantics  for  vhdl  using  lsl.  The  lsl  representations 
are  used  in  a  traditional  theorem  prover  (Penelope,  de¬ 
veloped  for  a  similar  annotation  language  for  Ada  [6])to 
verify  system  correctness.  jLarch/ VHDL  annotations  are 
added  to  a  specific  VHDL  description  to  represent  proof 
obligations  for  the  verification  process.  This  differs 
from  VSPEC’s  purpose  of  representing  requirements  and 
design  decisions  at  high  levels  of  abstraction, 
p  Augustin  and  Luckham’s  val  [2]  is  another  attempt 
to  annotate  vhdl.  The  purpose  of  a  val  annotation 
to  a  VHDL  description  is  to  document  the  design  for 
verification,  VAL  provides  mechanisms  for  mapping  a 
behavioral  description  to  a  structural  description.  Two 
val/vhdl  descriptions  of  a  design  can  be  transformed 
into  a  sfeif-checking  vhdl  program  that  is  simulated  to 
verify  that  the  two  descriptions  implement  the  same 
function.  This  is  once  again  slightly  different  than 
vspec’s  purpose  of  high  level  requirements  represen¬ 
tation. 

The  abstract  architecture  representation  capabilities 


function,  abs ,  is  simply  the  abstraction  function  ap¬ 
plied  to  each  trace  element,  image({eo,  v^e*})  = 
(abs(eo)1abs(e1),...1abs(en)).  The  reduce  function 
eliminates  invisible  state  changes  by  replacing  adjacent 
equivalent  states  in  a  trace  with  a  single  state.  For  ex¬ 
ample,  reduce((a,  b,  a,  a,  c,  c,  c))  =  (a,  6,  a,  c). 

A  concrete  specification  is  correct  with  respect  to 
reduced  abstract  equivalence  if: 

V  J  :  traces(P)  •  reduce(t)  E  traces(R).  (7) 

In  this  case,  an  architecture  specification  is  correct  if 
every  trace  of  concrete  states  can  be  reduced  to  a  legal 
trace  of  abstract  states.  Reducing  the  state  sequence 
removes  concrete  state  changes  that  are  hbt  observable 
in  the  external  state.  It  should  be  noted  that  the  com¬ 
ponent  semantics  thus  fair  specifies  only  liveness  prop¬ 
erties  (what  the  system  must  do)  and  largely  ignores 
safety  properties  (what  the  system  must  not  do)  [11]. 
The  weak  bisimulation  semantic  specifies  only  charac¬ 
teristics  of  the  resultant  state  and  by  defifinitipri  ignores 
characteristics  of  intermediate  states.  This  should  not 
be  viewed  as  a  fatal  flaw  because  this  is  precisely  wiiat 
traditional  block :;^agraiiis  defined  Some  methodolo¬ 
gies  may  extend  the  block  diagram  approach  to  include 
safety  properties,  but  the  traditional  diagram  specifies 
only  what^pust  happen  and  when  it  must  happen. 

5.  Related  Work 

Odyssey  Research  Associates  (ORA)  is  developing 
Larch/VHDL,  an  alternative  Larch  interface  language 


of  VSPEC  are  also  fairly  closely  related  to  several  archi¬ 
tecture  description  languages  that  have  been  developed 
:  to  describe  software  architectures  [5].  Some  of  the  more 
well  known  architecture  description  are  UniCon  [18], 
Wright  [1]  Pd  Rapide  [13,  14].  Each  of  these  lan¬ 
guages  allow  the  definition  of  components  and  connec¬ 
tors  to  define  a  software  architecture.  This  is  very  sim¬ 
ilar  to  the  vhdl  notion  of  a  structural  architecture. 

;;  Allen  and  Garlan’s  Wright  language  is  of  particu¬ 
lar  interest  when  discussing  vspec  because  a  Wright 
component  is  defined  with  a  variant  of  CSP.  Unlike 
vspec’s  use  of  CSP  to  define  component  synchroniza¬ 
tion,  Wright  uses  CSP  to  define  component  behavior 
as  well.  A  Wright  description  consists  of  a  collection 
bf  components  interacting  via  instances  of  connector 
types.  Wright’s  csp  descriptions  define  the  sequence 
of  events  a  component  or  connector  participates  in. 


This  paper  presented  vspec,  a  requirements  spec¬ 
ification  language  for  vhdl,  emphasizing  vspec  ar¬ 
chitecture  representation.  A  vspec  specification  de¬ 
scribes  the  pre-condition,  post-condition,  performance 
constraints  and  activation  condition  of  a  VHDL  entity. 
When  the  activation  condition  is  true,  the  entity’s  pre¬ 
condition  must  hold  and  the  entity  is  responsible  for 
making  its  post-condition  hold  in  the  next  state.  The 
semantics  of  a  single  component  vspec  specification  is 
based  on  the  canonical  Larch  axiomatic  approach  while 
CSP  is  used  to  define  the  semantics  of  an  architecture 


6.  Conclusions 
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of  components.  Two  satisfaction  criterion  used  to  ver¬ 
ify  that  an  architecture  is  a  refinement  of  requirements 
specification  were  discussed  here:  weak  bisimulation 
and  trace  equality.  Weak  bisimulation  evaluated  an 
architecture’s  halting  state  with  respect  to  a  require¬ 
ments  specification.  Trace  equality  compared  state 
traces  from  systems  that  do  not  halt.  These  mecha¬ 
nisms  allow  an  architectural  description  to  be  formally 
analyzed  at  the  requirements  level. 

At  the  present  time,  the  first  version  of  the  lan¬ 
guage  definition  is  complete.  A  VSPEC  parser  that  type 
checks  expressions  by  calling  an  LSL  parser  has  been  im¬ 
plemented.  Constraint  theories  for  the  five  basic  con¬ 
straints  (power,  area,  heat  dissipation,  clock  speed  and; 
pin  to  pin  timing)  have  been  developed.  The  formal 
semantics  of  a  single  component  VSPEC  specificatiojh 
based  on  the  canonical  Larch  approach  is  complete  as 
is  the  first  cut  at  the  semantics  of  an  abstract  archi¬ 
tecture  using  csp.  Several  specifications  using  these 
techniques  have  been  developed,  but  further  investiga¬ 
tion  into  architecture  semantics  is  needed.  ' 

The  main  area  of  future  work  for  vspec  is  to  re¬ 
fine  the  semantics  of  an  abstract  architecture  of  vspec 
components.  The  CSP  semantics  presented  in  this  pa¬ 
per  are  useful,  but  we  may  investigate  using  a  different  ; 
process  algebra  such  as  CCS  [16]  to  describe  Architec¬ 
tures.  The  main  reason  for  this  is  that  weak  bisimula¬ 
tion  was  originally  formulated  using  CCS  and  it  mayfbe 
more  natural  to  reason  about  weak  bisimulation  using:  | 
this  process  algebra. 

One  of  the  primary  goals  of  this  research  is  to  pro¬ 
vide  a  mechanism  that  allows  the  affects  of  architecture 
decisions  to  be  evaluated  earlier  in  the  design  process. 
vspec  accomplishes  this  goal  by  allowmg  components 
in  an  architecture  to  be  described  hsmIiiiA  traditional 
axiomatic  specification  apt  formally  modeling  the  in¬ 
teractions  between  comppjnbnts  using  a  process  algebrAll 
This  approach  allows  architecture  decisions  to  t>p  evaP 
uated  at  the  requirements  level  which  should  improve 
overall  design  quality.  .  • 

References 

[1]  R.  Allen  and  Architectural 

Connection.  In  Proc.  Sixteenth  International  Confer¬ 
ence  on  S0tware  Engineering,  pages71~80,  May  1994. 

[2]  L.  Aughstin,  D.  Luckham,  B.  Gennart*  Y.  Huh,  and 
A.  Staieulescu.  Hardware  Design  and  Simulation  in 
VAL/VHDL .  Kluwer  Academic  Publishers,  Boston, 
MA>  1991. 

[3]  P.  Ss^^nai  :J.  Penix,  and  P.  Alexander.  VSPEC:  A 
Declarative  Requirements  Specification  Language  for 
VHDL.  In  J:^Mf  •  iBerge,  O,  Levia,  and  i.  Rouillard,  ed¬ 
itors,  High-Level  System  Modeling:  .Specification  Lan¬ 


guages, volume  3  of  Current  Issues  in  Electronic  Mod¬ 
eling,  -chapter  3,  pages  51-75.  Kluwer  Academic  Pub¬ 
lishers,  Boston,  MA,  1995. 

[4]  P.  Chalin.  On  the  Language  Design  and  Semantic 
Foundation  of  LCL,  a  Larch /C  Interface  Specifica¬ 
tion  Language^  pPhp  thesis,  Concordia  University, 
Department  of  Computer  Science,  Montreal,  Quebec, 

ICanada,  December  1995. 

[5]  T>;  Garlan  and  M.  Shaw.  Arilntroduction  to  Software 
Architecture.  In  V.  Ambriola  and  GL  Tortora,  editors, 

J|F:X'#iifoance$  in  Software  Eng.  and  Knowledge  Eng.,  vol¬ 
ume  2,  pages  1-39.  World  Scientific,  New  York,  1993. 

|  [6]  D.  GuAspari.  Penelope;  An  Ada  Verification  System. 

In  Proteedingsof  Tri-Ada  ’89,  pages  216-224,  Pitts¬ 
burgh,  |||i  October  1989. 

[7]  J.  V.  Guttag  and  J.  J.  Horning.  Introduction  to 
LCL*  A  Larch/C  Interface  Language.  Technical  Re- 

llili!:  :;portrT4,  Digital  Equipment  Corporation  Systems  Re- 
-  search  Center,  130  Lytton  Avenue,  Palo  Alto,  CA 
94301,  July  1991. 

[8]  J.  V.  Guttagand  J,-J.  Horning.  Larch:  Languages  and 
Tools  for  Formal  Specification.  Springer- Verlag,  New 
York,  NY,  1993/1  : 

[9]  C.  A.  R.  Hoare.  Communicating  Sequential  Processes. 
Prentice-Hall,  Englewood  Cliffs,  1985. 

[10]  D.  Jamsek  and  M.  Bickford.  Formal  Verification  of 
VHDL  Models.  Technical  Report  RL-TR-94-3,  Rome 

:  :  .  Labofatofy^-Griffiss  Air  Force  Base,  NY,  March  1994. 

[ill  L.  Lamport  A  Simple  Approach  to  Specifying  Concur¬ 
rent  Systems.  Communications  of  the  ACM,  32(1) :32- 
45,  January  1989. 

|f  [12]  G.  T.  Leavens.  Larch/C-b+  reference  manual.  Avail¬ 
able  at:  ftp://ftp.cs.iastate.edu/pub/larchc++/ 

.  1  CpR  .  pS  .  gZ,  1995. 

[13]  PvpLuckham,  J.  Kenney,  L.  Augustin,  J.  Vera, 

:  UK  Bryan,  and  W.  Mann.  Specification  and  Analysis 
W  System  Architecture  Using  Rapide.  IEEE  Transac¬ 
tions  on  Software  Engineering,  21(4):315-355,  April 
1995. 

li;[14]  D.  Luckham  and  J.  Vera.  An  Event-Based  Architec¬ 
ture  Definition  Language.  IEEE  Transactions  on  Soft¬ 
ware  Engineering,  21(9):717-734,  September  1995. 

[15]  R.  Milner.  A  Calculus  of  Communicating  Systems, 
volume  92  of  Lecture  Notes  in  Computer  Science. 
Springer- Verlag,  Berlin,  1980. 

[16]  R.  Milner.  Communication  and  Concurrency.  Interna¬ 
tional  Series  in  Computer  Science.  Prentice  Hall,  New 
York,  NY,  1989. 

[17]  D.  Perry.  VHDL.  McGraw-Hill,  New  York,  NY,  1st 
edition,  1991. 

[18]  M.  Shaw,  R.  DeLine,  D.  Klein,  T.  Ross,  D.  Young,  and 
G.  Zelesnik.  Abstractions  for  Software  Architecture 
and  Tools  to  Support  Them.  IEEE  Transactions  on 
Software  Engineering,  21(4):314-335,  April  1995. 

[19]  R.  Tennent.  Principles  of  Programming  Languages. 
Computer  Science  Series.  Prenitce-Hall  International, 
1981. 


240 


APPENDIX  0: 

Formal  Representations  for  Abstract  System  Evaluation 

Perry  Alexander 

Department  of  Electrical  &  Computer  Engineering  and  Computer  Science 
PO  Box  210030  The  University  of  Cincinnati 
Cincinnati,  OH 
alexOeceCs . uc . edu 


Abstract 

Evaluating  design  decisions  early  in  the  design  pro¬ 
cess  is  critical  for  cost  effective  design.  Formal  anal¬ 
ysis  can  provide  such  evaluation  if  architectures  are 
defined  in  a  formal  way.  VSPEC  is  a  Larch  interface 
language  for  VHDL  that  annotates  VHDL  entities  using 
the  axiomatic  style  provided  by  Larch  interface  lan¬ 
guages.  Using  VHDL ’s  structural  definition  support, 
entities  described  in  this  manner  can  be  connected 
to  form  architectural  descriptions.  Activation  condi¬ 
tions  over  component  inputs  define  when  the  compo¬ 
nent  must  perform  its  transform.  In  this  paper,  we 
provide  a  simple  introduction  to  VSPEC  and  its  Mech¬ 
anisms  for  describing  systems  architectures. 

1  Introduction 

Design  decisions  made  early  in  a  system’s  design 
profoundly  affect  overall  design  quality.  Unfortu¬ 
nately,  such  decisions  are  rarely  evaluated  until  late 
in  the  design  process.  Simulation-based  design  lan¬ 
guages  such  as  VHDL  [lCfpo  not  allow  evaluation  until 
complete  models  exist  jSuch  models  include  not  only 
abstract  decisions,  bupalso  low  level  component  de¬ 
sign  decisions.  For  large  systems,  simulatable  models 
appear  late  in  the  design ;  increasing  the  cost  of  error 
correction.  ''  >*  VV .. 

A  solution  to  late  evaluation  is  formal  analysis  at 
the  requirements  leyeL:;:;Fprmal  representation  of  re¬ 
quirements  and  abstract  ardiitectures  supports  anal¬ 
ysis  of  incomplete  systems  at  high  abstraction  lev¬ 
els.  Furthermore,  formalisms  provide  some  guaran¬ 
tee  of  rigorpi  representation  and  correctness  in  analy¬ 
sis.  Abstract  architectures  support  reipi^entation  and 
analysis  of  requirements  partitioning  attempts  and  ar¬ 
chitecture  level  design  decisions. 

Support  for  this  work  was  provided  in  part  by  the  Advanced 
Research  Projects  Agency  and  monitored  by  ^Wright  Labs  under 
the  RASSP  Technology  Program^ contract  number  F33615-93- 
C-1316. 


An  abstract  architecture  is  an  inter-connected  col¬ 
lection  of  components  where  the  requirements  of  each 
fi^thgpnent  are  specified  without  defining  their  imple¬ 
mentation.  Thus,  an  abstract  architecture  describes  a 
class  of  solutions  rather  than  a  single  instance.  Instead 
of  waiting  for  a  completed  system  including  design 
detail,  formally  described  abstract  architectures  can 
be  evaluated  when  architecture  decisions  are  made. 
VSPEC  [1,  2],  a  Larch  interface  language  [4,  6]  for 
i.VHDL  [10];  is  a  requirements  description  language  that 
includes  formal  architecture  definition  support. 

.  :  VSPEC  describes  the  requirements  of  digital  system 
components  using  the  canonical  Larch  approach  and 
interconnects  component  descriptions  using  vhdl’s 
structural  definition  features.  Each  VHDL  entity  is 
annotated  with  a  pre-  and  post-condition  to  indi¬ 
cate  the  component’s  functional  requirements.  VSPEC- 
annptated  entities  are  connected  together  using  a 
VHtp  structural  architecture  to  form  an  abstract  ar¬ 
chitecture.  The  VHDL  architecture  indicates  intercon¬ 
nection  in  the  traditional  manner,  but  the  require¬ 
ments  of  each  component  are  defined  instead  of  their 
implementations.  An  activation  condition  can  be  de¬ 
fined  to  explicitly  indicate  when  a  component  should 
execute.  Finally,  vspec  allows  a  designer  to  describe 
non-functional  requirements  critical  in  selecting  from’ 
alternative  architecture  implementations. 

2  A  Brief  Summary  of  VSPEC 

vspec  is  a  requirements  specification  language  for 
digital  systems.  As  a  requirements  specification  lan¬ 
guage,  it  is  used  very  early  in  the  design  process  to 
describe  “what”  a  digital  system  must  do.  The  op¬ 
erational  style  of  VHDL  makes  VHDL  alone  ill-suited 
for  requirements  specification.  It  forces  a  designer  to 
describe  a  system  by  defining  a  specific  design  arti¬ 
fact  that  describes  “how”  the  system  behaves.  Using 
VHDL  as  a  requirements  specification  language  forces 
a  designer  to  deal  with  unnecessary  detail  at  an  early 
point  in  the  design  process. 
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In  contrast  to  VHDL’s  operational  style,  vspec  al¬ 
lows  a  designer  to  declaratively  describe  a  compo¬ 
nent.  A  vspec  description  of  a  sorting  component 
is  shown  in  Figure  1.  As  with  most  other  Larch  in¬ 
terface  languages,  the  requires  and  ensures  clauses 
are  used  to  state  the  pre-  and  post-conditions  of  the 
component.  The  sort  component  does  has  a  pre¬ 
condition  of  true  which  means  it  will  function  cor¬ 
rectly  for  any  set  of  inputs.  The  post-condition  states 
that  the  output  contains  all  the  same  elements  as  the 
input  (i.e.  perraut  at  ion  (output  ’post ,  input))  and  j 
the  output  is  in  order.  Any  implementation  of  a  sort^ 
ing  component  that  makes  this  post- condition  true:; 
in  the  next  state  is  a  valid  implementation  of  tbj§K 
requirements.  More  generatlly,  given  a  compop^hil 
with  requires  predicate  I(St)  and  ensures  predicate 
0(St ,  St* post)1,  f(St)  is  an  implementation  of  the  re¬ 
quirements  if  the  following  condition  holds: 


•/(«)=>  0(StJ(St)) 


(1) 


In  addition  to  allowing  a  designer  to; -describe 
“what”  a  component  does,  vspec  also  addresses  an¬ 
other  shortcoming  of  vhdl:  it  allows  a  designer  to 
specify  performance  constraints  in  a  consistent  fash¬ 
ion.  The  VSPEC  constrained  by  clause  is  usil&lfor 
this  purpose.  As  shown  in  Figure  1,  this  clause  defm|s 
relations  over  constraint  variables.  Currently,  the  de¬ 
fined  constraint  variables  include  power  consumption, 
layout  area  (expressed  as  a  bounding  box),  heat  dissi¬ 
pation,  clock  speed  and  pin  to  pin, timing.  Constraint 
theories  written  in  lsl  defii^||-j^^|pi&>nstraint  type. 
Users  may  define  their  owri  constraints  and  theories  if 
desired.  'llll 

The  state  clause  contains  a  list  of  variable  dec¬ 
larations  that  define  Jpe  internal  state  of  a  compo¬ 
nent.  These  variables  naaintain  state  information  that 
may  not  be  recorded  by  the  values  of  the  component’s 
ports.  A  state  clause  is  riot :  fie|di^||^; , the  sorting 
component  specification  in  Figure 

The  modifies  clause  lists  variables,  pcriiii|inpslg- 
nals  whose  values  changed  by  the  entity:  Most 

other  Larch  interface:  languages  contain  a  modifies 
clause,  andjlpiefinition  of  VSrif||fcodif  ies  clause 
is  very  similar  to  the  definitions  found  in  these  lan¬ 
guages  [3*  5,  8].  The  includes  clause  is  used  to  in¬ 
clude  Larch  Shared  Language  definitions  in  a  VSPEC 
description.  The  sorts  and  operators  deigned  in  the  lsl 
trait named  by  :;the  includes  clause  cari  pe  used  in  the 


1The  Str post  :riri|atiriii!;!^erences  the  value  of  St  in  the  state 
after  the  transformation-  described  by  the  entity  is  performed. 
This  is  analogous  to  the  variable*  notation:  of  LCL  [3,  5] 


VSPEC  definition.  In  this  example,  the  SortOps  trait 
defines  two  predicates:  permutation  and  sorted. 

The  sensitive  to  clause  plays  the  same  role  in  a 
VSPEC  definition  that  sensitivity  lists  and  wait  state- 
meri|s  play  in  a  VHDL  description .  It  defines  when 
a  ripmponent  is  active.  The  sensitive  to  clause 
for  sort  in  Figure  1  states  that  the  entity  activates 
\(and  sorts  its  input)  whenever  ‘fie  input  changes. 
pThe  sensitive  to  clause  contains  a  predicate  indi¬ 
cating  when  an  entity  should  begin  executing.  The 
next  section  -contains  a  more  precise  semantics  for  the 
sensitive  ito  predicate. 

3;  Abstract  Architectures 

• :;VHDL  structural  architectures  composed  of  VSPEC 
anridtritrid  comp onent s  Jp ecify  abstract  architectures. 
The  VHDL  architecture  remains  unchanged  indicat¬ 
ing  component  Inst aritiation  and  connections.  How¬ 
ever,  the  configuration  does  not  assign  an  en¬ 
tity/architecture  priir  to  each  component  instance  in 
the  architecture.  Instead,  the  configuration  states  that 
each  .component:  references  an  entity  with  an  architec¬ 
ture  iicalled  VSPEC.  This  signifies  that  at  the  current 
point  in  the  design,  the  requirements  of  this  compo¬ 
nent  are  known  (via  the  VSPEC  description)  but  no 
^implementation  has  been  defined. 

Consider  the  VSPEC  description  of  a  find  compo- 
nent  shown  in  Figure  2a.  The  output  of  find  is  the  el- 
|riment  from  the  input  array  with  the  same  key  as  the 
kpmpiit.  This  requirement  is  represented  by  find’s 
eii|ures  clause.  One  possible  way  to  meet  this  re¬ 
quirement  is  to  connect  the  output  of  a  sorting  com¬ 
ponent  to  a  binary  search  component  as  shown  in  Fig- 
§pe  3.  The  specification  for  sort  is  the  same  as  the 
:  one  in  Section  2  while  the  bin_search  specification 
is  shown  in  Figure  2b.  The  only  difference  between 
this  structural  description  of  find  and  a  VHDL  struc¬ 
tural  description  of  find  is  the  configuration  specifies 
that  the  VSPEC  descriptions  of  sort  and  bin_searcli 
should  be  used  instead  of  a  specific  architecture  for 
these  two  entities.  This  configuration  describes  an 
abstract  architecture  for  the  find  component.  Any 
implementation  satisfying  the  vspec  requirements  of 
sort  and  bin-search  may  be  associated  with  these 
entity  definitions.  The  abstract  architecture  for  find 
defines  a  class  of  solutions  with  a  common  structure. 

Although  a  vhdl  architecture  referencing  vspec 
definitions  defines  components  and  interconnections, 
additional  information  must  be  added  to  specify  when 
the  VSPEC  components  activate.  In  traditional  se¬ 
quential  programming,  a  language  construct  “exe¬ 
cutes”  following  termination  of  the  construct  pre- 
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entity  sort  is  port 

(input:  in  integer .array; 
output :  out  integer .array) ; 
includes  SortOps; 
modifies  output; 
sensitive  to  input’event; 
requires  true; 
ensures 

permutat ion ( output  ’post ,  input)  and 
sorted(output ’post) ; 
constrained  by 

power  <=  5  mW  and  size  <=  3  urn  *  5  um 
and  heat  <=  10  mW  and  clock  <=  50  MHz 
and  input <-> output  <=  5  Ms; 
end  sort; 

Figure  1:  VSPEC  description  of  a  sorting  component. 

state  of  an  LCL  description  into  an  environment  and 
a  store  [11].  The  environment  maps  (variable)  iden¬ 
tifiers  into  objects  and  the  store  binds  objects  to  the 
values  they  contain: 

•••  Env  ==  Id  — >  Obj  (2) 

Store  ==  Obj  -+  Value  (3) 

Separating  the  environment  and  the  store  in  this 
fashion  is  common  among  formal  models  of  program 
instate.  In  a  language  such  as  LCL,  a  motivating  fac¬ 
tor  fox  this  is  to  allow  multiple  names  for  the  same 
element  of  memory.  For  example,  two  C  pointers  can 
obviously  reference  the  same  memory  location.  The 
program  state  model  above  represents  this  situation 
:by  mapping  each  of  these  pointers  to  the  same  object 
;in  the  Env  map. 

This  partitioning  of  component  state  is  used  in  the 
vspec  state  model.  In  addition  to  allowing  the  correct 
representation  of  vhdl  access  types,  this  partition¬ 
ing  also  allows  the  state  of  an  abstract  architecture 
to  be  more  easily  represented.  For  a  single  VSPEC- 
specified  component,  Env  contains  a  map  from  each 
port  and  state  variable  in  the  VSPEC  description  to 
an  object.  Store  maps  each  of  these  objects  to  their 
current  value.  We  call  this  the  abstract  state  of  the 
vspec  component. 

When  vspec  components  are  connected  together 
to  form  an  abstract  architecture,  the  elements  of  Env 
and  Store  are  slightly  different.  The  Store  contains  ob¬ 
jects  for  each  port  in  the  architecture’s  entity,  for  each 
signal  in  the  architecture  and  for  the  state  variables  of 
each  component  in  the  architecture.  The  Env  maps 
each  of  these  three  types  of  elements  to  the  proper 


ceding  it.  For  correct  execution,  a  construct’s  pre¬ 
condition  must  be  satisfied  when  the  preceding  con¬ 
struct  terminates.  In  .hardware  systems,  components 
exist  simultaneously  and  behave  as  independent  pro¬ 
cesses.  No  predefined  execution  order  exists  $6  there 
is  no  means  of  implicitly  determining  when  a  compo¬ 
nent’s  pre-condition  should  hold. 

VHDL  provides  sensitivity  lists  and  wait  state¬ 
ments  to  synchronize  entity  execution  and  define  when 
a  component  in  a  structural  architecture  is  active, 
VSPEC  achieves  the  same  end  using  the  sensitive  to 
clause.  The  sensitive  to  clause  contains  a  predicate 
called  the  activation  condition  that  indicates  when  an 
entity  should  begin  executing.  Effectively,  this  acti¬ 
vation  condition  defines  when  a -VSPEC  annotated  en¬ 
tity’s  precondition  must  hold .  When  the  sensitive 
to  predicate  is  true,  the  pre-condition -must  hold  and 
the  implementation  must  satisfy  the  post-condition. 
When  the  sensitive  :to  predicate  is  false,  the  en¬ 
tity  makes  no  contribution  to  the  state  of  the  system. 
In  the  find  example,  both  components  activate  when 
any  of  their  input  signals  change.  ^’T^. 

Formally,  the  contribution  of  tSe  sensitive  to 
clause  to  the  transformation  specified  by  VSPEC  is  eas¬ 
ily  represented  using  a  traditional  process  algebra  such 
as  csp  [7].  Components' become  processes  and  events 
are  defined  :<juj  the  states  a  cohipbiieiif  enters.  Thus, 
any  VSPEC  component  can  be  described  by  a  process 
that  consumes  states  and  generates  a  process  in  a  new 
state.  To  define  such  state  changes,  a  component  state 
is  defih|||alpng  with  a  means  for  combining  compo¬ 
nent  states  into  an  architecture  state. 

The  formal  .‘JVSPEC  model  of  the  state  of  a  com¬ 
ponent  is  based  on  ; Dhalin’s  state  model  [3,  Chapter 
6]  for  LCL.  This  model  partitions  the  computational 
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entity  find  is  port 

(input:  in  el ement. array ; 
k:  in  keytype; 
output :  out  element ) ; 
includes  Element (element , keytype , 
element .array) ; 

modifies  output; 
sensitive  to 

input* event  or  k’ event; 
requires  true; 

ensures  forall  (e  :  element) 
(output  «  e  implies 
(e.key  =  k 

and  elem.of (e , input ) ) ) ; 
constrained  by 
power  <=  5  mV 
and  size  <=  3  urn  *  5  urn 
and  k<-> output  <=  5  Ms 
and  heat  <=  10  mW 
and  clock  <=  50  MHz; 
end  find; 

(a.) 


entity  bin.search  is 
port  (input:  buffer  element  .array;. 
k:f|in  integer; 
value:  out  element); 
modifies  value; 
sensitive  to 

input  *  event  or  k ’ event ; 
requires  sorted (input) ; 
ensures  output  =  e  iff  (e.key=k  and 
e  1  ement  ippif  ( e ,  input ) ) ; 
.iqpnstrained  by.J|>: 

:|||:|;pbwer  <=  1  mV  and 
• size  <=1  urn  *  2  urn; 
end  bin.se  arch; 

(*>•)  . 


Figure  2:  VSPEC  descriptions  of  find  and  binary  search  components. 


object,  but  it  also  maps  the  ports  of  each  architecture 
component  to  the  object  that  represents  the  architec¬ 
ture  signal  the  port  is  connected  to.  We  call  the  slSfe 
model  of  an  abstract  architecture  the  concrete  state  <M 
the  component. 

In  the  simple  two  component  example  of  Figure  4, 
the  abstract  state  of  system,  A  and  B  are: 


ETWsystem 

— 

{sys-in  <->  objsy$_in , 
t$/S-0Ut  >-*  objsys^out} 

Storesystem 

{objSyS_in  t— ►  VSyS_irii'£P 
objSyS_oUt  1  *■  Vsys—out } 

Euva  '■ 

{s  vbjy} 

StortA 

= 

{objx  i->  vx,  dbjy 

Euvb 

=  ;:; 

objw,  z  t-+  obj2}\ ||: 

StorzgM 

111 

{bbjw  P+  vw,  objz  •-+  «z} 

The  conc|pe  state  of  the  struct  architecture  is: 

~  {sys—in  •— ► 

.  : .•  ;  J:  sys-out  !->  qpys  — out  j 

Wff objCjx obj$ys_in , 

y  objc, 


$f  2  ^  °b]sys—out} 

Store struct $yStem  ~  {°bjsys_in  l—*  v$ys-.im 
O^jsys — out  1  *  Vsys—outi 

objc  »-»  vc] 

.  :No:tice  that  x,  y,  w  and  z  now  map  to  the  objects 
coitaining  the  signal  values  the  component  ports  are 
connected  to. 

,:.;:p;The  semantics  of  a  VSPEC  entity  are  defined  by 
m  CSP  process  that  defines  the  sequence  of  states 
the  entity  passes  through.  Let  C  be  an  entity 
with  sensitive  to,  requires  and  ensures  predi¬ 
cates  S(St)j  I(St)  and  0(St,  St'post),  respectively. 
The  process  defining  C  in  any  state  r  is: 

Cr  —  r  :  ¥  Crfpost  (4) 

where  ^  =  {t  :  Tc\S(t)}  is  the  set  of  states  that  sat¬ 
isfy  C’s  activation  condition  and  Px  is  the  process  P 
in  some  state  x.  0(r,  r'post)  must  hold  to  assure  the 
transformation's  correctness.  Thus,  when  an  external 
force  changes  the  abstract  state  to  one  that  satisfies 
the  entity's  activation  condition  (r  in  Equation  4),  the 
process  will  consume  r  and  behave  like  Cr>po$t •  A 
trace  of  the  process  defined  by  a  VSPEC  entity  is  a  se¬ 
quence  of  abstract  states  the  entity  enters.  Each  of 
these  states  satisfy  C's  activation  condition.  Thus, 
the  alphabet  of  C  is  equal  to 
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architecture  structure  of  find  is 
component  sorter 

port  (input:  in  element. array; 

output :  out  element. array) ; 
end  component; 
component  searcher 
port  (input:  in  element. array; 
key:  in  integer; 
value:  out  element); 
end  component; 
signal  y:  element .array; 
begin 

bl:  sorter  port  map(input  ,y) ; 
b2:  searcher  port  map (y,k, output) ; 
end  structure; 


configuration  test.vspec  of  find  is 
for  structure; 

for  bl: sorter  use  entity 
work .  sort:  (VS  PEC) ; 
end  for}::';- 

for  b2:  searcher  use  entity 
work,bin_search(VSPEC) ; 
end  for; 
end  f  or.; • 
end  test.struct; 


Figure  3:  A  VSPEC  abstract  architecture  representation  of  the  find  component. 


If  f(Si)  implements  the  requirements  specified  by 
I(St)  and  0(St,  St' post)  (i.e.  f(Si)  satisfies  Equa¬ 
tion  1),  Equation  4  can  be  re-written  as: 


Cr  =  r  :  # 


(5) 


In  this  situation,  the  process  consumes  r  and  /  -is 
applied  to  r  to  generate  a  new  abstract  state.  The 
entity  then  behaves  like  the  process  defined  by  C/(r). 

CSP’s  concurrency  operator  combines  component 
processes  to  define  the  behavior,  of  a  VSPEC  architec¬ 
ture.  Let  Cu  C2, Cn  be  Jhl| 
by  Equation  4  or  5  for  the  set 
instances  in  architecture J4;  The  process  representing 
architecture  A  is: 


processes  represented 
bfij§SpEe:  component 


a =MKm 


Cn 


(6) 


When  the  current  state  satisfies; some  component’s 
activation  condition,  the  component  performs  its  spec-- 
ified  transformation  to  its  abstract  state;;  This  change 
is  propagated  to  the  concrete  state  of  the  architecture 
where  the  activation  condition,  of  another  component 
may  be  satisfied.;;  ^TliiS  causes  the  process  to  repeat 
until  the  sysprri  changes  to  a  concrete  state  where  no 
components  activation  condition  is  satisfied.  The  sys¬ 
tem  then  waits  until  some  external  source  changes  the 
concret||state  to  one  that  activates  sorrie  component 
in  the -architecture  to  start  the  processflgain. 

In  the  CSP  model  of  a  vspec  proems,  this  notion 
can  be  understood  by  ^examining  the  possible  traces 
of  A  from  Equation  6. :  Hoare  [7]  defines  traces  over 
parallel  composition,  traces (&  ||  C|g),  as: 


iraces^C]  \\  C2)  = 


{2|(*  \  aCj)  E  iraces(Ci) 
A(t  \  olC2 )  6  iraces(C2) 
A ie(aCx  U  ctC2y) 


Thus,  the  traces  of  a  parallel  composition  of  com¬ 
ponents  are  all  traces  that  when  restricted  to  the  al¬ 
phabet  of  each  component  yield  a  trace  of  that  compo¬ 
nent.  ?  Furthermore,  traces  of  Cj  ||  C2  only  contain 
events  from  the  alphabet  of  either  components.  Thus, 
every  trace  of  A  contains  only  states  that  satisfy  the 
activation  condition  of  at  least  one  component  in  *4. 

If  A  enters  a  state  where  none  of  its  component’s 
activation  condition  is  true,  it  will  wait  for  a  change 
on  one  of  its  input  ports.  Sequences  in  trace$(A)  con¬ 
tain  only  states  that  activate  a  component  of  A  so 
the  process  representing  A  only  consumes  those  states. 
However,  a  change  to  a  component’s  input  port  also 
causes  a  state  change  and  inactive  components  must 
wait  for  events  from  external  sources  to  initiate  acti¬ 
vation.  Traces  (A)  is  not  strictly  the  set  of  all  states 
a  component  may  enter,  but  the  set  of  all  states  a 
component  enters  from  active  states. 

4  Conclusions 

This  paper  presented  a  basic  introduction  to  VSPEC, 
a  requirements  specification  language  for  VHDL.  A 
VSPEC  specification  describes  the  pre-condition,  post¬ 
condition,  performance  constraints  and  activation 

2Recall  that  in  esp  [7],  t  f  otP  restricts  the  trace  t  to  contain 
only  events  that  appear  in  the  alphabet  of  P. 
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entity  A  is  port 
(x  :  in  integer; 
y  :  out  integer) ; 
requires  /^(r); 
ensures  O a  (r,  y'post) ; 
modifies  y; 
end  A; 

entity  B  is  port 
(v  :  in  integer; 
z  :  out  integer) ; 
requires  Jg(t£j); 
ensures  Ob  (tv,  z'post) ; 
modifies  z; 
end  B; 

entity  system  is  port 
(sys_in  :  in  integer; 
sys_out  :  out  integer)  ; 
end  system; 

Figure  4:  Example  of  two  entities  connected  serially. 
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condition  of  a  vhdl  entity.  When  the  activation , con- 
dition  is  true,  the  entity’s  pre-condition  must  hold  and 
the  entity  is  responsible  for  making  its  postcondition 
hold  in  the  next  state.  The  semantics  of  a  single  com¬ 
ponent  VSPEC  specification  is  based  on  the  canonical 
Larch  axiomatic  approach  while  csp  is  used  to  define 
the  semantics  of  an  architecture  of  components. 
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architecture  struct  of  system  is 
component  A  ■■■■.. 

port  (x  :  in  integer; 

y  :  out  integer);/ 
end  component':; 

component  B||  ;  :: 

port  (w  :||in  integer; 

z  :':out  integer) ; 
end  component;  ' 
signal!#; 

begins 

cLi  A  port  map(sys^ih|c) ; 
c2 :  B  port  map (d , sys_out ) ; 
enpstruct; 
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Abstract 

Complex  digital  systems  are  often  decomposed  into  architectures  very  early  in  the  design 
process.  Unfortunately,  traditional  simulation  based  language  such  as  vhdl  do  not  allow  the 
impact  of  these  architectural  decisions  to  be  evaluated  until  a  complete,  simulatable  design  of  the 
system  is  available.  After  a  complete  design  is  available,  architectural  errors  are  time-consuming 
and  expensive  to  correct,  However,  there  is  an  alternative  to  simulation  based  techniques:  for¬ 
mal  analysis  of  abstract  architectures  at  the  requirements  level.  This  paper  describes  VSPEC ’s 
approach  for  defining  and  analyzing  abstract  architectures.  VSPEC  is  a  Larch  interface  language 
for  vhdl  that  :  allows  a  designer  to  specify  the  requirements  of  a  vhdl  entity  using  the  canonical 
Larch  approachJ;:;:;yhpL  structural  architectures  that  instantiate  VSPEC  entities  define  abstract 
architectures.  These  abstract  architectures  can  be  evaluated  at  the  requirements  level  to  de¬ 
termine  the  impact  of  architectural  decisions.  This  paper  briefly  introduces  VSPEC,  provides  a 
formal  definition  of  VSPEC  abstract  architectures  and  presents  two  examples  that  illustrate  the 
a||iitectural  definition  c^^ilities  of  the  language. 
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by  Wright  Labs  under  the  RASSP  Technology  Program,  contract  number  F33615-93-C-1316. 
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1  Introduction 

Architectural  design  decisions  made  early  in  a  system’s  design  profoundly  affect  overall  design 
quality.  Unfortunately,  architecture  decisions  are  rarely  evaluated  until  late  in  the  design  process. 
Simulation-based  design  languages  such  as  VHDL  [5,  12]  do  not  allow  evaluation  until  complete 
models  exist.  For  large  systems,  simulatable  models  appear  date  in  the  design  process  driving 
up  the  cost  of  error  correction.  These  models  include  not  dnly  architectural  decisions,  but  also 
component  design  decisions.  The  ability  to  analyze  architectural  decisions  as  they  are  made  would 
significantly  reduce  this  cost. 

A  solution  to  late  architecture  evaluation  is  formal  analysis  dffabstract  architectures  at  the 
requirements  level.  An  abstract  architecture  is  an  interconnected  collection  of  components  where 
the  requirements  of  each  component  are  specified  without  defining  their  implementation.  Thus,  an 
abstract  architecture  describes  a  class  of  solutions  with  a  common  structure  rather  than  a  single 
instance  from  that  class.  Formally  described  abstract  architectures  can  be  evaluated  early  in  the 
design  process  when  architecture  decisions  are  riiade  before  component  designs  exist. 

vspec  [7],  a  Larch  interface  language  [10]  for  VHDL  [12],  is  a  requirements  specification  language 
that  includes  formal  architecture  -definition  support.  VSPEC  describes  the  requirements  of  digital 
system  components  using  the  canonical  Larch  approach.  Each  vhdl  entity  is  annotated  with  a  pre- 
and  post-condition  io  specify  the  entity’s  functional  requirements.  vsPEC-annotated  entities  can 
be  connected  together  using  a  Vhdl  structural  architecture  to  form  abstract  architectures.  The 
vhdl  architecture  indicates  interconnection  in  the  traditional  manner,  but  the  vspec  specification 
defines  the  requirements  of  each  component  instead  of  a  specific  design. 

Thdfdescription  of  a  sorting  Component  illustrates  the  difference  between  vhdl  and  VSPEC.  In 
vhd£,  the  simplest  way  to  describe  the  function  of  a  sorting  component  is  a  behavioral  architecture 
th&iip|»lexnent s  a  quicksort,  bubble  sort  or  some  other  sorting  algorithm.  This  is  actually  a 
description;  of  ;“hpw’’-:  the  sorting  component  behaves.  In  contrast,  a  vspec  specification  of  this 
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entity  sort  is 

port  (input:  in  element _array ; 

output:  out  element_array) ; 
includes  SortPredicates ; 
modifies  output; 
sensitive  to  input’ event; 
ensures 

permutation(output ’post,  input) ; 
ordered (output ’post) ; 
end  sort; 


Figure  1:  VSPEC  description  of  a  sort  entity. 


component  explicitly  describes  “what”  the  device  must  do  without  defining  “how”  it  is  done.  A 
vspec  description  of  a  sorting  component  is  shown  in  Figure  1.  It  states  the  output  has  all 
the  same  elements  as  the  input  (permutation(output  ’post , input) )  and  the  output  is  in  order 
(ordered(output’post)).  Any  sorting  algorithm  may  be  used  to  implement  these  requirements, 
but  VSPEC  allows  this  algorithm  to  be  chosen  later  in  the  design  process. 

Larch  interface  languages  have  been  developed  for  a  variety  of  programming  languages  includ¬ 
ing  C  [9],  C++  [15]  and  Modula-3  [14].  At  the  single  component  level,  vspec  differs  very  little 
from  other  interface  languages.  However,  defining  a  Larch  interface  language  for  vhdl  presents 
a  problem  not  found  in  these  other  languages.  In  traditional  programming  languages,  a  language 
construct  executes  after  the  construct  immediately  preceding  it  terminates.  In  vhdl,  there  is  no 
implicit  execution  order  among  process  level  constructs  and  thus  no  means  of  determining  when  a 
component’s  pre-condition  should  hold,  vspec  addresses  this  problem  by  allowing  a  user  to  define 
an  activation  condition  in  addition  to  the  pre-  and  post-condition  for  an  entity.  When  an  entity’s 
state  satisfies  its  activation  condition^its  pre-condition  must  hold  and  the  entity  must  perform  its 
specified  transformation; 

Th|§ :paper  describes  VSPEC,  concentrating  on  the  language’s  facilities  for  describing  abstract 
architectures.  Section  2  provides  a  brief  summary  of  the  VSPEC  language.  Section  3  describes 
VSPECiffostract  architectures,  including  a  definition  of  the  vspec  state  model  and  a  description  of 
how  a  probes^:. algebra  ( cs P  [11])  is  used  to  provide  a  semantics  for  the  vspec  activation  condition. 
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Section  4  presents  two  example  vspec  specifications,  concentrating  on  the  architecture  representa¬ 
tion  portions  of  each  specification.  Finally,  the  paper  concludes  with  a  discussion  of  related  work 
and  a  brief  summary. 


VSPEC 


VSPEC  is  used  to  describe  “what”  a  digital  system  should  do.  It  adds  a  requirements  definition 
capability  to  VHDL  entities  analogous  to  the  requirements  definition  capability  that  Larch  interface 
languages  add  to  traditional  procedure  and  function  signatures.  As  shown  in  Figure  2,  the  require¬ 
ments  of  a  vhdl  entity  can  be  defined  by  describing  a  relationship  from  the  current  inputs  and 
state  of  the  system  to  the  outputs  and  the  next  state.  This  section  describes  how  F(x,  s)  and  s  axe 
defined  in  VSPEC  and  contrasts  these  definitions,  with  VHDL  definitions  of  F(x ,  s )  and  s. 


input 

ports 


en 

Eijty  E 

1 

1 

1 

1 
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ports 
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Figuiref2:  State-based  specification  model. 

As  shown  in  the  find  entity  of  Fligurf  3,  a  vhdl  entity  defines  an  interface.  The  output  of  find 
should  be  the  element  from  the  input  array  with  the  same  key  as  the  key  input.  A  vhdl  entity 
does  not  describe  functional  information  such  as  this.  The  entity  only  defines  the  component’s 
interface.  r  • :  "  . 

entity  find  is  port 


output 


Figure  3:  A  VHDL  entity  defining  the  interface  for  a  find  component. 


( input :  in  el ement_ array;  HP' 

key:  in  keytype; 

input 

Jjibutpnt:  out  element); 

key 
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architecture  behavior  of  find  is 
begin 

process  ( input, k) 
begin 

for  i  in  input’ range  loop 
if  key  =  input (i). key  then 
output  <=  input (i); 
exit ; 
end  if ; 
end  loop; 
end  process; 
end  behavior; 


Figure  4:  A  behavioral  VHDL  architecture  defining  the  f  ind  component’s  behavior. 

The  VHDL  architecture  construct  describes  the  function  of  a  component  by  associating  be¬ 
havior  and/or  structure  with  an  entity.  Figure  4  is  a  behavioral  VHDL  description  of  the  find 
component’s  function.  In  terms  of  the  state  model  in  Figure  2,  this  architecture  describes  F(x,s) 
as  a  linear  search  algorithm.  This  looks  very  similar  to  a  C  or  Pascal  function  describing  “how”  the 
system  behaves.  Unfortunately,  this  operational  description  biases  the  system  towards  a  particular 
implementation.  Since  vspec’s  purpose  is  requirements  specification,  it  is  undesirable  to  bias  the 
system  to  a  particular  implementation  this  early  in  the  design  process. 

vspec  eliminates  this  problem  by  allowing  a  user  to  declaratively  specify  the  requirements  of 
a  digital  system.  Seven  clauses  annotate  the  VHDL  entity  construct  to  allow  the  specification  of 
“what”  a  component,  should  do  instead  of  vhdl’s  description  of  “how”  the  component  performs 
this  function.  The  requires,  ensures  and  sensitive  to  clauses  are  used  to  specify  the  device’s 
functional  requirements.  Non-functional  constraints  are  described  in  the  constrained  by  and 
modifies  clauses.  The  component’s  internal  state  is  declared  in  the  state  clause  and  the  includes 
clause  is  used  to  make  type?  and  operators  from  a  Larch  shared  language  description  visible  in  a 
vsp EG  component.  The  remainder  of  this  section  briefly  summarizes  these  clauses.  For  a  more 
complete  description  of  the  VSPEC  clauses,  see  one  of  the  other  vspec  references.  [1,  7] 

•*"  Component  function  is  described  in  the  requires  and  ensures  clauses.  The  requires  clause 
defines  a  pre- conditip| ;  over  inputs  and  state  variables  while  the  ensures  clause  defines  a  post- 
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entity  find  is  port 

(input:  in  element .array; 
k:  in  keytype; 
output:  out  element); 
includes  Element (element , key type, 
element .array) ; 

modifies  output; 
requires  true; 

ensures  forall  (e  :  element) 
(output  =  e  implies 
(e.key  =  k 

and  elem.of(e, input)) ) ; 
constrained  by 
power  <=  5  mW 
and  k<->output  <=  5  Ms 
and  heat  <=  10  mW 
and  clock  <=  50  MHz; 
end  search; 
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Figure  5:  The  find  entity  annotated  with  a  VSPEC  definition. 


condition  over  inputs,  outputs  and  state  variables.  The  ensures  clause  defines  legal  outputs  and 
the  next  state  when  the  requires  clause  is  satisfied.  A  component’s  user  is  responsible  for  making 
certain  the  requires  clause  is  satisfied  whenever  the  component  is  in  use.  When  the  requires 
clause  is  satisfied,  the  described  entity  is  responsible  for  making  the  ensures  clause  true. 

Let  <7  be  the  state  of  a  VSPEC  entity  as  defined  by  its  ports  and  state  variables.  If  1(a)  is  the 
requires  predicate  and  0(a,  a')  is  the  ensures  predicate,  then  the  vspec  annotation  defines  the 
following  requirements: 

J  1  jpll  V a  •  3  a'  •  1(a)  =$•  0(a,  a1)  (1) 

F(a)  is  an  implementatlih  |||  these  requirements  if  the  following  condition  holds: 

.  . . !-0g^0a)  =>  0(a,  F(a))  (2) 

A  ind  component  is  shown  in  Figure  5.  Notice  that  the  requires 

clause  predicate  is  true  meaning  this  entity  will  function  correctly  for  any  set  of  inputs  of  the 
proper  type.  The  ensures  clause  predicate  states  that  the  output  element  has  the  same  key  as 
the  k  input  and  output  is  in  the  input  sequence.  In  terms  of  the  state  model  in  Figure  2,  this 
defines  the  requirements  of  F(z,  s),  but  unlike  the  vhdl  description,  it  does  not  describe  how  to 


6 


252 


implement  the  component. 

The  VSPEC  sensitive  to  clause1  is  used  to  define  when  a  component  in  an  abstract  architecture 
is  active.  When  the  sensitive  to  clause  predicate  is  true,  a  component’s  pre-condition  must  hold 
and  an  implementation  must  satisfy  the  post-condition.  A  more  precise  description  of  this  clause 
can  be  found  in  Section  3. 

Performance  constraints  are  described  in;  the  constrained  by  and  modifies  clauses.  Con¬ 
straints  define  requirements  such  as  clock  speed  or  layout  area  that  are  not  part  of  the  functional 
description.  The  constrained  by  clause  defines  relations  over  constraint  variables.  Currently,  the 
defined  constraint  variables  include  power  consumption,  clock  speed,  area,  pin-to-pin  timing,  and 
heat  dissipation.  Constraint  theories  written  in  the  Larch  Shared  Language  (lsl)  [10]  define  each 
constraint  type.  Users  may  define  their  own  constraints  and  theories  if  desired.  The  modifies 
clause  lists  variables,  ports  and  signals  whose  values  may  be  changed  by  the  entity.  This  clause 
is  useful  when  specifying  whether  an  entity  modifies  a  shared  variable.  The  fist  of  objects  an  en¬ 
tity  modifies  is  not  a  traditional  performance  constraint,  but  this  does  restrict  the  set  of  potential 
solutions.  Examples  of  the  constrained  by  and  modif  ies  clauses  are  shown  in  Figure  5. 

The  state  of  a  VSPEC  entity  is  described  by  the  port  definition  and  variables  in  the  state 
clause.  In  vhdl,  ports  maintain  their  values  between  entity  invocations.  Thus,  port  values 
from  the  previous! st ate  may  be  accessed  in  the  current  state.  The  state  clause  is  used  to  define 
internal  state  variables  that  are  used  in  the  VSPEC  definition  only.  These  variables  maintain  state 
information  that  is  not  recorded  in : port  values.  When  a  vspec  specification  is  refined  into  a  vhdl 
architecture,  these  internal  state  variables  will  be  refined  into  signals  or  variables  that  represent 
the  same  information.  The  state  clause  variable  declaration  represents  this  information  during 
the  requirements  specification  phase  of  the  entity’s  design.  An  example  of  the  state  clause  can  be 
found  in  the  Move  Machine  description  in  Section  4.2. 

f  1  Previous  versions  of  VSPEC  [1,  2, 1]  did  not  have  a  sensitive  to  clause. 
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The  includes  clause  is  the  final  vspec  clause.2  This  clause  is  used  to  include  lsl  definitions 
in  a  vspec  description  or  vhdl  package  declaration  (see  Section  4.2.)3  LSL  is  used  to  define  the 


example  of  the  includes  clause  is  shown  in 


types  and  functions  used  in  a  vspec  specification.  An, 

Figure  5  and  its  syntax  is  the  keyword  includes  -followed  by  a  list  of  trait  references.  The  syntax 
of  a  trait  reference  is  similar  to  a  trait  reference  in  lsl.  It ;  consists  of  the  trait  name  followed  by 
an  optional  parameter  list.  The  parameter  list  is  used  to  rename  LSL  names  to  a  name  visible  in 
the  vspec  entity.  Thus,  an  integer  stack  is  included  in  a  vspec  specification  with  this  includes 
clause:  includes  Stack (integer,  int_stack). 

3  Ar  chit  ect  ur  es 


The  previous  section  briefly  described  how  VHDL  and  VSPEC  are  used  to  define  the  requirements  of 
a  single  device  in  a  digital  system.  Th#lehavior  of  a  device  can  also  be  described  by  decomposing 
it  into  smaller  pieces  and  connecting  these  pieces  together  to  form  an  architectural  description 
of  the  device.  This  architectural  description  represents  a  refinement  of  the  device’s  behavioral 
vhdl/vspec  description.  VHDL  provides  convenient  facilities  for  defining  architectural  descriptions. 
This  section  briefly  discusses  these  facilities  and  then  describes  how  vspec  uses  them  to  form  an 
abstract  architecture.  If  Iflllllll" 

3.1  VHDL  Structural  Af<$itectures 

vhdl  uses  structural  architectures  to  represent  component  composition.  A  structural  architecture 
describes  how  sub- components  are  connected  together  to  form  a  larger  component.  Figure  6  shows  a 
structural  architecture  for  f  MdffUnlike  the  behavioral  representation  in  Figure  4,  this  architecture 
indicates  that  a  sort  comp  onentfconnect  ed  to  a  search  component  implements  the  find  function. 

f:fP^yigtt|:;^ersions  of  VSPEC  [1,  2,  7]  also  contained  a  based  on  clause.  The  modified  syntax  of  the  includes  clause 
described  here  made  the  based  on  clape  obsolete. 

3  Allowing  includes  clauses  in  package  declarations  is  a  change  from  previous  versions  of  VSPEC.  [1,  2,  7] 
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This  structural  architecture  should  perform  the  same  function  as  that  specified  in  the  behavioral 
description. 

The  vhdl  component  construct  defines  each  component  used  in  a  structural  architecture.  The 
structure  architecture  of  find  in  Figure  6  declares  two  types  of  components  that  are  used  in  this 
architecture:  sorter  and  searcher.  One  instance  of  each  of  these  components  (named  bl  and  b2) 
is  created  in  the  body  of  this  architecture.  The  port  maps  of  these  component  instances  are  used 
to  indicate  how  the  components  are  connected  together.  In  the  structure  architecture  for  find, 
the  system’s  input  array  is  connected  to  the  sorter  input  and  the  sorter  output  is  connected 
to  internal  architecture  signal  y.  The  signal  y  and  system  input  k  are  inputs  to  the  searcher 
component.  The  output  of  the  searcher  is  connected  to  the  device  output. 


The  vhdl  configuration  construct  is  used  to  bind  entity- architecture  pairs  to  component  in¬ 
stances.  In  this  example,  the  test_struct  configuration  binds'  the  bubble  sort  defined  by  entity 
sort  with  architecture  behavior  to  the  bl  instance  of  the  sorter  component.  Similarly,  the  binary 
search  defined  by  entity  bin_search  with  architecture  behavior  is  bound  to  the  b2  instance  of 
searcher.  If  there  were  other  architectures  for  these  two  entities  (such  as  a  structural  architec¬ 
ture),  a  different  configuration  could  have  been  specified  stating  that  the  components  in  structure 
mapped  to  these  architectures.  Entirely  different  entities  could  even  have  been  defined. 


Since  a  structural  architecture;  only  defined  dataflow  between  components,  an  additional  mech¬ 
anism  must  be  ^fovided fto  define  when  a  component  activates,  vhdl  accomplishes  this  with 
sensitivity  lists  and  wait  statements.  A  sensitivity  list  contains  a  list  of  signals.  Whenever  an 
event  occurs  on  on?  of  these  signals,  the  process  resumes  execution.  The  behavior  architecture 
for  sort  is  sensitive  to  its  single  input,  While  bin_search  is  sensitive  to  its  input  array  and  key 
value..  >This  means  the  sort  component  sorts  its  input  only  when  new  input  arrives.  Likewise,  a 
seaijoccurs  only  when  the  key  value  or  input  array  changes.  A  wait  statement  achieves  the  same 
result  by  waiting  on  signal  conditions  or  for  a  specific  simulation  time  interval.  In  this  example, 
wait  statements  cohld  (replace  Sensitivity  lists  by  removing  the  sensitivity  lists  and  placing  wait 
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architecture  structure  of  find  is 
component  sorter 

port  (input:  in  e lement .array ; 

output:  out  el ement .array ) ; 
end  component; 
component  searcher 

port  (input:  in  el ement .array ; 
key:  in  keytype; 
value:  out  element); 
end  component; 
signal  y:  element.array; 
begin 

bl:  sorter  port  map(input ,y) ; 
b2:  searcher  port  map(y,k, output ) ; 
end  structure; 


entity  bin.search  is 

port  (input:  in  element. array; 
jlfP^^key :  in  keytype; 

value:  out  element); 
eiid  bin.  search; 


:hr chitecture Jbehav ior  of  bin.search  is 
begin 

i . : ; process  (input , key )  begin 

-Binary  search  algorithm 
— definition  in  behavioral  VHDL 
end  process; 
end  behavior; 

configuration  test. struct  of  find  is 
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statements  referencing  the  same  signals  at  the  end  of  the  process  definitions. 

These  constructs  allow  vhdl  to  support  architecture  representation.  Component  declarations 
describe  the  inputs  and  outputs  of  each  component  type  used  in  the  architecture.  Instances  of  these 
components  are  created  in  the  architecture  body  and  configurations  are  used  to  map  component 
instances  to  an  entity/architecture  pair.  Net  lists  indicate  signal  flow  between  component  instances 
while  sensitivity  lists  or  wait  statements  synchronize  component  actions. 

3.2  VSPEC  Abstract  Architectures 

VHDL  structural  architectures  containing  VSPEC  annotated  components  specify  abstract  architec¬ 
tures.  The  vhdl  architecture  remains  unchanged  indicating  component  instantiation  and  connec¬ 
tions.  However,  a  vhdl  architecture  is  not  assigned  to  each  component  instance  in  the  architecture. 
Instead,  the  configuration  defines  that  each  component  references  an  entity  with  an  architecture 
called  VSPEC.  This  signifies  that  at  the  current  point  in  the  design,  the  requirements  of  this  com¬ 
ponent  are  known  (via  the  vspec  description)  but  no  implementation  has  been  defined.  4 

The  structure  architecture  of  find  shown  in  Figure  6  becomes  an  abstract  architecture  by 
referencing  VSPEC  definitions  of  the  instantiated  components.  Figure  7  shows  vspec  entity  defini¬ 
tions  for  the  sort  and  bin_search  components  in  Figure  6.  A  new  configuration,  test_vspec,  has 
been  defined  for  the  find  entity.  It  specifies  that  the  vspec  descriptions  of  sort  and  bin_search 
should  be  used  Instead  of  a  specific  architecture  for  these  two  entities.  This  configuration  describes 
an  abstract  architecture  for  thl  iind  component.  Any  implementation  satisfying  the  VSPEC  require¬ 
ments  Of  sort?  and  bin_search  may  be  associated  with  the  entity  definitions.  The  architectures 
specified  Figure  6  represent  one  such  solution,  but  there  are  many  others. 

VSPEC  description  of  ij||  specifies  the  requirements  for  a  sorting  component:  the  input  and 
outUl  must  have  all  the  same  elements  (i.e.  output  is  a  permutation  of  input)  and  the  output  must 

•|f  4 This  is  different  than  leaving  the  entity  open.  When  a  VHDL  entity  is  left  open,  the  design  is  being  deferred.  At 
the  current  point  in  the  design,  nothing  is  known  about  the  function  of  the  entity.  In  contrast,  the  requirements  of  a 
VSPEC  entity  are  known;: even  though  an  implementation  is  not. 
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be  in  order.  In  a  similar  fashion,  the  bin_search  specification  states  that  whenever  the  component 
input  is  sorted,  the  component  must  ensure  that  the  output  element  contains  the  same  key  as  the 
k  input  and  this  element  is  an  element  of  the  input  array.  The  requires  and  ensures  clauses  of 
these  entities  use  two  predicates  (permutation  and  ordered)  to  define  these  requirements.  These 

predicates  are  defined  in  the  lsl  trait  SortPredi cates  which  is  included  in  both  vspec  entities. 

entity  sort  is  Jff  requires  ordered( input ) ; 

port  (input:  in  element  .array;  ensures  output  =  e  iff  (e.key=k  and 

output:  out  element.array) ;  element.of (e, input)) ; 

includes  SortPredicates ;  o'  end  bin. search; 

modifies  output; 

sensitive  to  input’ event;  conf igurationtest.vspec  of  find  is 

ensures  for  structure" 

permutation(output’post, input) ;  for  bl: sorter  use  entity 

ordered (output ’post) ;  work. sort (VSPEC) ; 

end  sort;  end  for; 

for  b2: searcher  use  entity 

entity  bin.search  is  •.  •oork.bin.search(VSPEC) ; 

port  (input:  buffer  element.array;.  end  for; 

key:  in  keytype;  end. for; 

output:  out  element);  endjpest.struct ; 

includes  SortPredicates ; 
modifies  value; 

sensitive  to  k’ event  or  input ’event; 

Figure  7:  vspec  definitions  for  the  sort  and  bin_^search  components  in  the  find  architecture. 


Although  a  vhd£  architecli|||ieferencing  vsPECldefinitions  defines  components  and  intercon¬ 
nections,  additional  information  must  be  added  to  specify  when  the  vspec  components  activate. 
In  traditional  sequential  programming,  a  language  construct  “executes”  following  termination  of 
the  construct  preceding  it.  For  correct  execution,  a  construct’s  pre-condition  must  be  satisfied 
when  the  preceding  construct  terminates, jih  hardware  systems,  components  exist  simultaneously 
and  behave  as  independent  processes.  No  predefined  execution  order  exists,  thus  there  is  no  means 
for  determining  when  a  component’s  pre-condition  should  hold.  Consider  the  find  example.  The 
precondition  of  bin_search  need  hold  only  when  sort  has  completed  its  transformation.  At  all 
other  tiifies.:  bin_search  need  only  maintain  its  state. 

vhdl  provides  sensitivity  lists  and  wait  statements  to  synchronize  entity  execution,  vspec 
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achieves  the  same  end  using  the  sensitive  to  clause.  The  sensitive  to  clause  contains  a  pred¬ 
icate  called  the  activation  condition  indicating  when  an  entity  should  begin  executing.  Effectively, 
the  activation  condition  defines  when  a  VSPEC  annotated  entity’s  pre-condition  must  hold.  When 
the  sensitive  to  predicate  is  true,  the  pre-condition  must  hold  and  the  implementation  must 
satisfy  the  post-condition.  When  the  sensitive  to  predicate  is  false,  the  entity  makes  no  contri¬ 
bution  to  the  next  state  of  the  system.  Like  the  requires  and  ensures  clauses,  the  sensitive  to 
predicate  is  defined  over  entity  port  definitions  and  variables  defined  in  the  state  clause. 

Recall  that  the  structural  vhdl  architecture  for  find  (Figure  6)  specified  that  the  sort  compo¬ 
nent  should  only  activate  when  its  input  changes  and  the  binary  search  component  activates  when 
one  of  its  inputs  changes.  Without  the  sensitive  to  clause,  specifying  this  behavior  in  VSPEC 


would  not  be  possible.  Note  the  sensitive 


to  clauses  defined  in  the  vspec  description  of  find  in 


Figure  7.  In  vspec,  a  signal’s  5  event  attribute  is  true  if  the  signal  changed  value  from  the  previous 
state.  Thus,  both  components  activate  whenever  any  of  their  inputs  change  value. 


3.3  Architecture  Model  Semantics 

The  previous  section  provided  an  informal  description  of  how  vspec  can  be  used  to  define  an  ab¬ 
stract  architecture.  This  section  provides  a  more  precise,  formal  definition  of  the  concepts  presented 
above.  First,  the  state  of  a  vspec  description  -is  defined.  After  this,  a  precise  definition  of  how  the 
sensitive  to|f  equires  aiid  ehsures  clauses  define  a  transformation  over  this  state  is  presented. 
The  section  concludes  with  asimple  example  that  illustrates  these  points. 

3.3.1  State  Definition 

The  state  definition  for  an  entity  is  a  map  from  port,  signal  and  variable  names  to  their  values. 
Th^l  are  three  different  views  ft  an  entity  state:  (1)  abstract;  (2)  component;  and  (3)  concrete 
statef:The  abstract  state  is  defined  by  a  vspec  description  of  an  entity.  The  component  state  is 
the  state  of  a.  sirigi^bpmponent  in  an  abstract  architecture  and  the  concrete  state  represents  the 
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state  of  all  components  of  an  abstract  architecture. 

The  abstract  state  includes  the  ports  and  state  variables  of  an  entity.  .The  vs  PEC  sensitive 
to,  requires  and  ensures  clause  predicates  are  defined  over  elements  of  the  abstract  state  of 
the  entity.  The  component  state  applies  to  an  entity  included  as  a  component  in  a  structural 
architecture.  The  component  state  is  formed  by.  taking  the  entity’s  abstract  state  and  subjecting 
it  to  the  renaming  imposed  by  the  signals  the  component  is  connected  to  in  the  architecture. 
This  component  state  is  used  to  construct  the  concrete  state  of  the  structural  architecture.  The 
concrete  state  is  the  union  of  the  component  states  for  all  of  the  components  in  an  architecture. 
This  structural  architecture  represents  a  refinement  of  the  vs  pec  definition  of  the  entity.  There  is 
an  abstraction  function  mapping  the  concrete  state  of  the  structural  architecture  to  the  abstract 
state  defined  by  the  vspec  description  of  the  entity  the  structural  architecture  refines. 

Consider  the  vspec  entity  in  Figure  8.  The  abstract  state  of  the  three  entities  in  this  figure  are 
the  inputs,  outputs  and  state  variables  of  the  entities.  Thus,  the  abstract  states  of  these  entities 
are: 

ABSTRACT system  =  {sys-in  io,sys-.out  i^sysstate  t-+ i%} 

ABSTRA  {ini  i-*-  i$,resutt Ip  i^, 'cl state  *5} 

ABSTRACTcom.fi  =  {ini  *-*•  is ,  in2  1— >  *7,  result  is ,  c2 state 

where  io ,  ii ,  ...ig  are  all  integers .  /tjjiijs  shown ,  the  slate  is  a  map  from  names  to  values.  However,  for 
the  purpose  of  clarity  we  will  show  just  the  names  that  form  the  various  states  throughout  the  rest 
of  this  paper. 

Within  the; struct  architecture  for  the  system  entity,  the  A’s  component  state  (the  first  instance 
of  compl)-  is  found  by  liking  tcompl’s  abstract  state  and  performing  the  renaming  defined  by  the 
signal!;  the  component  is  connected  to.  In  this  case,  ini  is  connected  to  sys_in  and  result  is 
connected  to  signal  x.  Thus,  in  the  context  of  the  struct  architecture,  ini  of  component  instance 
A  should  be  replaced  by  sys_in  and  result  replaced  by  x.  A  similar  renaming  can  easily  be  found 
for  the  inputs  and  outputs  of  the  other  components  in  the  struct  architecture.  The  renaming  for 
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entity  system  is 
port  (sys_in  :  in  integer; 

sys__out  :  out  integer;); 
state  (sys_state  :  integer;); 
end  system; 

entity  compl  is 
port (ini  :  in  integer; 

result  :  out  integer;); 
state  (cl_state  :  integer;); 
end  compl; 

entity  comp2  is 

port (ini,  in2  :  in  integer; 
result  :  out  integer;); 

state  (c2_state  :  integer;); 
end  comp2; 


architecture  struct  of  system  is 
component  compl 
port  (ini  :  in  integer; 

result  :  out  integer;); 
^nd  component ; 

component  comp2 

port  (ini,  ,in2  in  integer; 

result  : :;i: out  integer; ) ; 
end  component ; 

signal  x,  y  :  integer; 

begin 

A  :  compl  port  map (sys_in,x) ; 

B  :  compl  port  map (x,y) ; 

C; comp2  port  map(x,y , sys_out) ; 
:end  struct ; 


Figure  vsp’EC  eirt.it y  used  to  explain  the  differences  between  abstract,  component  and 

concrete  state. 
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the  other  components  is  shown  in  the  definition  A  and  B’s  component  states  below. 

Since  the  struct  architecture  has  more  than  one  instance  of  the  comp  1  entity,  the  state  variables 
of  compl  must  be  renamed  to  form  the  component  state.  This  renaming  avoids  conflicts  when 
forming  the  concrete  state  of  the  struct  architecture.  To  simplify  matters,  we  will  always  rename 
a  component’s  state  variables  even  if  there  is  ori!y  one  instance  of  an  entity  in  the  architecture.  A 
number  of  renaming  functions  could  be  chosen,  but  the  one  used  here  is  the  state  variable  name  in 
the  abstract  state  subscripted  with  the  instance  label  from  the  architecture.  The  component  states 
of  the  components  in  the  struct  architecture  are: 

COMPONENT  a  =  ABSTRACT compi  [*nl  /sys-in,  result /x ,  clstate  /  cl  state,  a  ] 

=  {sys-in,  x,  cl  state  a  } 

COMPONENTS  —  ABSTRACT 'comvi  [ini /x,  result/y,  cl  state/ cl  states] 

—  {x,y,  cl  states} 

COMPONENT c  —  ABSTRA CTcomps [inT/ x ,  in2 / y ,  result / syssut,  c2state/c2statec] 

=  {x,  y,  syssut,  c2 state c } 


We  are  now  ready  to  form  the  concrete  stile,  of  the  struct  architecture  for  the  system  en¬ 
tity.  The  concrete  state  is  simply  the  union  of  the  component  states  for  each  component  in  the 


architecture: 


CONCRETEstrudsystem  1|  COMPQU/ENPa  U  COMPONENTS  U  COMPONENT c 

=  {  syssn ,  syssut ,  x,  y,  cl  state  a-,  cl  states  >  c2  state  c} 

Since  an  abstract  architecture  represents  a  refinement  of  the  requirements  specified  by  VSPEC, 
an  abstraction  function  can  be  defined  to  map  the  concrete  state  of  the  architecture  the  abstract 
state  defined  by  the  vspec  description. 

Togethep^fhe  abslrltl^i^mponent  aid  concrete  states  represent  the  state  of  a  vspec  com¬ 
ponent:-  The  examples  in  Sections  3.3.3  and  4  use  these  definitions  to  describe  how  a  vspec 


description  behaves. 
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3.3.2  Transform  Definition 

The  transform  performed  by  a  VSPEC  architecture  is  defined  by  the  sensitive  to,  requires  and 
ensures  clauses.  The  formal  definition  of  the  requires  and  ensures  clauses  was  discussed  in 
Section  2.  It  is  very  similar  to  the  transform  defined  by  a  traditional  Larch  interface  language.  As 
described  in  Section  3.2,  the  sensitive  to  clause  is  used  to  synchronize  components  and  define 
when  the  requires  clause  predicate  must  be  satisfied. 

Formally,  synchronization  is  easily  represented  using  a  traditional  process  algebra  such  as 
CSP  [11].  Events  are  defined  as  changes  in  the  state  of  the  entity.  Assume  that  F(St)  is  a  func¬ 
tion  between  two  states  of  entity  P  that  implements  the  requirements  specified  in  P’s  requires 
and  ensures  clauses  (i.e.  F(St )  satisfies  Equation  2).  The  process  defined  by  entity  P  with  a 
sensitive  to  predicate  of  S(St)  in  ;^y]sthte  5^Is y/ 

p5|::=  t  :  SEN:.  — *  P/T(S«)  /  (3) 

where  SEN  is  the  set  of  states  that  satisfy  P’s  sensitive  to  clause:  SEN  =  {f|S(i)}.  Thus,  a 
process  in  state  St  first  waits  for  its  sensitive  to  clause  to  be  satisfied  and  then  behaves  like  the 
same  process  in  the  stateidefitted  by  applying  F  to  the  current  state. 

Equation  3  defines  a  CSP  process  that  describes  the  behavior  of  a  single  vspec  entity,  csp’s 
concurrency  operator  (||)  is  used  to  define  a  process  that  describes  the  behavior  of  an  architecture 
of  vspec  components.  Let  Po,Pi,  Pn  be  the  processes  represented  by  Equation  3  for  the  set  of 
vspec  component  instances  in  architecture  V.  The  process  that  represents  architecture  V  is: 

:  ?  =  po  ||  Pi  || ...  ||  Pn  (4) 

Thus,  each  component  in  the  architecture  executes  in  parallel.  Since  a  component  activates  only 
when  its  sensitive  to  clausq  predicate  is  true,  this  predicate  is  used  to  synchronize  component 


execution. 


263 


entity  example  is 

port(i:  in  integer;  o:  out  integer); 
end  example; 

architecture  structural  of  example  is 
component  cl 

port (input:  in  integer; 

output:  out  integer); 
end  component; 
component  c2 

port ( input :  in  int  eger ; 
output:  out  integer); 
end  component; 
begin 

bl:  cl  port  map(i,y); 
b2:  c2  port  map(y,o); 
end  structural; 


entity  cl  is 

port  (x:  in  integer;  z:  out  integer); 
mod if i  es  z; 
sensitive  to  x' event ;•'§ 
requires  Ii(x); 
ensur  es  Oj  ( x ,  z'post ) ; 
end  cl  |f:* 

entity  c2  is::::;,, 

port  (x:  in  integer;  z:  out  integer); 
W  modifies  z$S' 

sensitive  po  xl event; 

.  ..requires  0(x); 

ensures  W# (x,  z'post); 
end 


Figure  9:  Specification  of  two  components  connected  serially. 


3.3*3  Formal  Model  Example 

This  section  presents  a  simple  exampleip  explain  Ilow  the  concrete  state  of  a  vspec  architecture 

changes  as  its  inputs  are  modified  by  external  Jpinponents.  Consider  the  architecture  shown  in 

Figure  9.  The  abstract,  component  and  concret^state  of  the  elements  of  this  architecture  are: 

ABSTRACfgf  =  {x,z} 

4|j|^  ABSTRACT^  =  {x,z} 

Jf  ;  COMPONENT^  =  {i,  y} 

Jlllte:::,.  ;  ' . .  COMPONENT c2  =  iy9o} 


MOWC RETE siTUciura\ 


example 


=  {i,o,y} 


The  transformation:  ^performed  by  ah  architecture  is  defined  from  the  components  comprising 

it.  Formally, :the  component  requirement!  for  cl  and  c2  are  defined  as: 

V x  :  integer.,  3  z  :  integer  •  Ii{x)  ^  Oj(x, z'post ) 

Vx  :  integer, 3  z  :  integer  ■  h(x)  =>•  Oz( x,  z'post ) 


The  renaming  defined  by  the  architecture  that  is  used  to  create  the  component  state  from  the 
abstract  state  of  anarchitecture  can  also  be  applied  to  these  two  equations.  In  this'  example,  this 
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defines  the  following  logical  requirements  for  cl  and  c2: 

V  i :  integer ,  3  y  :  integer  •  //(?)=*>  0/  (*,  j/'post) 
V  y  :  integer ,  3  o  :  integer  ■  h{jy)=>  02(y,  o' post) 


The  renaming  function  is  also  applied  to  the  modifies,  state  and  sensitive  to  clause  of  cl 
and  c2.  After  this  renaming,  the  logical  definitions  of  each  component  are  expressed  in  the  same 
name  space  as  the  concrete  state  of  the  system. 

Assume  that  a,  b  and  c  are  integer  constants  and  that  f(x)  and  g(x)  are  functions  that  satisfy 
requirements  for  cl  and  c2  respectively.  Let  the  initial  concrete  state  of  the  system  be  S0  =  {i  >-> 
a,y  i->  b,o  !-*■  c)  and  let  i' event  be  true  and  y' event  be  false.  This  means  that  cl’s  sensitive  to 
clause  is  satisfied  and  cl’s  pre-condition  must  hold,  cl  will  then  make  its  post- condition  hold  in 
the  next  state.  Instantiating  the  requirements  for  cl  gives: 

3  z  :  integer  •  It  (a)  Oj  (a,:z)  (5) 

Knowing  that  f(x )  satisfies  cl’s  requirements  and  assuming  7/  (a)  is  true  implies  that  Oj  ( a,f(a )) 
is  also  true.  Additionally,  y' event  is  known  to  be  false  so  c2  maintains  its  state  and  o  does  not 


change  in  the  next  state 
/(a),  o  !-*•  c}.  Because?  the 


Thus,  one  potential  next  state  for  this  system  is  Si  =  {i  a,  y 
function  /  is  one  of  potentially  many  functions  satisfying  cl,  we  cannot 
claim  that  this  is  the  only  possible  next  state. '  ? 

Since  y  changed  values  from  So  to  Si ,  the  predicate  y' event  is  true  in  Si .  Additionally,  i  did 
not  change  values  in  St  implying  that  i' event  is  false  in  <S; .  Thus,  only  component  c2  activates  in 
state  Si . 

Using  tte  slihie  reasoning  used  for  Si  ,  values  for  S2  can  be  produced.  Assuming  that  /(a) 
satisfies|js(/(c))  and  knowing  g(x)  satisfies  c2’s  requirements  makes  02(f(a),g(f(a)))  true.  The 
input?  value  i  has  not  changed, -ci  maintains  its  state  implying  y  does  not  change,  and  g(f(a)) 
satisfies  c2  s  output  condition.  Thus,  S2  =  {i  i— ►  ct,  y  t— »•  /(a),o  > — >■  p(/(o))}  is  a  potential  next 
state  for  the  systefiiJ 
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An  interesting  exercise  is  defining  what  happens  when  the  input  value  i  changes  between  states 
So  and<Si .  Assume  that  i  changes  value  from  a  to  d  making  Si  =  {:«  /(a),  o  *->  c}.  Now 

i' event  is  true  in  Si  and  both  components  execute  on  values  from  Si.  In  this  case,  Sg  =  {i  <->■ 
d,y  f(b),  o  i  g(f(a))}.  Note  the  value  of  o  does  not  change  from  the  previous  example  because 
the  next  state  is  defined  only  on  variables  defined  in  the  current  state.  Using  this  model  eliminates 
difficulty  caused  by  instantaneous  feedback  and  “pipelined”  update  functions,  vhdl  solves  this 
same  problem  by  allowing  an  infinite  number  of  delta  delays  between  major  clock  cycles  of  the 
simulation.  .  j.- 

3.4  Generating  Proof  Obligations 

The  vs  pec  formal  model  can  be  used  to  verify  that  a  system’s  abstract  architecture  description 
satisfies  the  requirements  described  by  the  vspec  specification  of  the  system.  This  verification 
provides  evidence  that  the  abstract  architecture  description  satisfies  the  abstract  vspec  specifica¬ 
tion.  Finding  such  evidence  depends  on:  (1) -having  the  system  requirements  I  and  0;  and  (2) 
relating  a  concrete  state  produced  by  the  abstract  architecture  with  the  abstract  state  specified  for 
the  system.  A  systern’s-ifSPEC  description  provides  Jpnd  0.  The  abstraction  function  from  the 
concrete  to  the  abstract  state  provides  the  means  for  comparing  the  abstract  and  concrete  states. 

Weak  bisimulation  [19]  is  used  as  ihlie  correctness  criteria  when  attempting  to  verify  that  an 
abstract  architecture  /satisfies  a  vspec  description.  As  shown  in  Figure  10,  weak  bisimulation 
requires  that  some  sequence  of  sthtf  .  changes  in  the  concrete  state  of  the  system  result  in  the 
correct  single .  state  change  in  the  abslfillptate.  Only  the  first  and  last  of  the  concrete  states  are 
significant.  ;The  system  may  pass  through  any  concrete  state  as  long  as  the  abstraction  function 
applied-  to  the  final  concrete  il||i  results  in  the  correct  abstract  state  as  defined  by  the  abstract 
specification. 

:.§--::inlfeSP ,  :thte; sequence  of  states:  a  vspec  entity  passes  through  is  called  a  trace.  A  csp  trace  of 
process  P  is  a  finite  sequence  of  symbols  representing  the  events  processed  by  P.  vspec  events 
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Figure  10:  Concrete  state  changes  associated  with  a  single  abstract  state  change. 

are  changes  in  state  and  they  are  represented  in  a  trace  by  the  state  the  entity  changes  to.  Thus, 
a  vspec  entity  satisfies  the  weak  bisimulation  criteria  if  two  conditions  hold  for  all  traces  of  the 
abstract  architecture.  The  first  condition  is  that  the  abstraction,  function  applied  to  the  initial 
element  of  each  trace  must  result  in  an  abstract  state  that  satisfies  the  abstract  pre-condition.  The 
second  condition  is  that  the  final  element  of  eachflrace  must  either  have  an  abstract  projection 
that  satisfies  the  abstract  post-condition  or  there  must  be  some  legal  sequence  of  states  that  can 
be  appended  to  the  trace  to  form  another  trace.  This  ensures  that  the  concrete  state  eventually 
reaches  a  state  where  the  abstract  specification  is  satisfied. 

4  Examples 

This  section  presents  two  examples  that  illustrate  how  VSPEC  can  be  used  to  describe  an  abstract 
architecture.  The  first  example  is  a  simple  tri-state  buffer  description  that  is  used  to  define  a  simple 
2  input  multiplexor;  This  example  illustrates  what  happens  when  multiple  sources  drive  a  single 
value  in;|||pPEC  abstract  architecture.  €fhe  second  example  is  the  description  of  a  simple  CPU 
caUe<||the  Move  Machine.  This  example  illustrates  shows  a  vspec  description  that  is  decomposed 
intdlln  abstract  architecture. 
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entity  buffer  is 

port  (input:  in  integer; 

control:  in  boolean; 
output:  out  integer); 
sensitive  to  control ' event  or  input ' event ; 
ensures  control  implies  output 'post  =  input; 
end  buffer;  ;:v  Jf 

Figure  11:  vspec  description  of  asimple  buffer. 

architecture  si|pibt  of  mux  is 
I  component  buffer 
:  port  (input:  in  integer; 

..  control:  in  boolean; 

..output:  out  integer); 
end  component; 
component  not 
port  (input:  in  boolean; 

output :  out  boolean) ; 
end  component; 

signal  select_inv  :  boolean; 
begin 

bl :  buffer 

port  map  ( ini;;  select ,  output )  ; 
b2::buff  er 

iport  map (in2,select_inv, output) ; 

;  hi :  not 

^|:,port  map  (select ,  select.inv) ; 
ehd  struct.; 

Figure  12:  vspec  and  abstract  architectup  description  of  a  2-input  mux. 

4.1  Buffer  and  iMultipl^  Example 

A  vspec  descrip|||n:  of  a  simple  buffer  is  shown  in  Figure  11.  In  this  example,  input  and  output 
are  both  integers,  butjhe  specification  could  also  be  used  if  input  and  output  were  of  any  other 
type.  When  control  is  tru®f:^i|s:-:dte^(^. ;passes  input  to  output.  When  control  is  false,  the 
device  places  no  requirements  on  the  value  of  output  in  the  next  state.  The  specification  allows  for 
output  Ipimaint ain  its  raf rent ?iyalue  in  the  next  state,  but  the  specification  also  allows  an  external 
device  - to  change  the  value  of  bittput.  Consider  using  this  buffer  as  a  component  in  the  abstract 
architecture  description  of  the  multiplexor  in  Figure  12. 

This  figure  shows  both  a  vjf  EC  description  of  a  multiplexor  as  well  as  a  refinement  of  this 
description  into  ah  abstract  architecture.  The  vspec  entity  mux  is  a  straightforward  description  of 


entity  mux  is 

port  (ini,  in2:  in  integer; 
select:  in  boolean; 
output:  out  integer); 
sensitive  to  ini' event  or 
in2' event  or  select 'event ; 
ensures 

(select  and  output 'post  =  ini)  or 
(not  select  and  output 'post  =  in2) ; 
end  mux; 
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a  multiplexor.  The  struct  architecture  uses  two  instances  of  buffer  and  a  not  gate  to  decompose 
the  multiplexor  into  an  abstract  architecture. 

Careful  examination  of  this  description  reveals  a  very  subtle  but  important  point  about  vspec 
specifications  and  multiply  driven  signals.  If  a  component  description  does  not  restrict  the  value 
of  an  output  signal  in  the  next  state,  other  components  in  the  system  can  still  change  the  value 
of  this  signal  without  violating  the  component  description.  Suppose  that  the  concrete  state  of  the 
architecture  is:  A, . ... 

CON  CRETEstructmux  =  { ini  i->  7,in2  i— ►  3,  select  t-*  true,  output  »-*•  7,  selectNnv  > false} 
so  that  the  abstract  state  of  buffer  instance  bl  is: 

ABSTRACTu  =  {input  i-»  7,  control  >— >  true,  output  »-*•  7} 

Assume  that  some  external  device  changes  the  select  input  to  false.  This  causes  buffer 
instance  bl’s  control  input  to  change  to  false  which  activates  the  buffer.  This  device  must  now 
make  its  ensures  clause  true  in  the  next  state.  Since  control  is  false,  the  ensures  clause  will  be 
true  in  the  next  state  for  any  value  of  output.  Thus,  buffer  instance  b2  can  change  the  output 
signal  of  the  architecture  to  3  without  violating  bf?s  specification.  The  next  state  of  the  device  is: 

CONCRETEstrucimitx  =  {ini  h*  7,  in2  i-»  3 ,  select  false,  output  >-»  3,  selectNnv  i->  true } 
Thus,  the  output  sighal  has  changed  values  even  though  the  bl  buffer  instance  does  not  cause  it  to 
do  so.  Even  though  bl  does  not  force  a  change  in  state,  it  does  not  prohibit  one  either.  An  external 
device  (buffer  instance  b2)  has  caused  the  output  signal  to  change  values.  The  specification  of  bl 
allows  this  change  to  occur. 

This  description  |na.y  not  seem  correct  to  an  experienced  vhdl  user  because  the  output  signal 
is  driven  £$:  two  sources,  but  no>  resolution  function  is  specified.  Although  this  is  illegal  in  vhdl,  it 
is  allowed  in  vspec.  In  most  cases,  the  csp  statement  that  defines  a  vspec  entity’s  contribution  to 
thefi|xt  state  of  the  system  will  flefine  a  single  value  for  every  signal,  but  a  vspec  description  may 
^PW:!more:  than:  one  value  for  a  specific  signal.  This  is  legal  vspec  because  vspec  is  a  specification 
language,  not  a  simulation  language  like  vhdl.  This  implies  that  a  vspec  specification  does 
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not  need  to  deterministically  define  a  single  value  for  every:; signal  in  the  system.  It  is  certainly 
possible  to  do  this  with  vspec  by  defining  the  requirements  of  resolution  functions,  but  a  vspec 
specification  could  allow  a  signal  to  be  driven  to  two  ;(ox  more)  different  values.  In  these  cases,  a 


designer  implementing  the  specification  may  chose 


lo:  drive  the  signal  to  any  of  its  allowed  values. 


4.2  The  Move  Machine 

A  more  complex  example  is  the  specification  ofaMove  Machine  [22].  The  Move  Machine  is  a  simple 
CPU  that  moves  data  from  one  memory  location  to  another..  It  uses  four  instructions:  jump,  load 
register  from  memory,  store  register  to  memory,  and  halt  and  four  addressing  modes:  absolute, 
immediate,  indirect  and  relative.  Although  the  Move  Machine  is  M  simple  device,  its  structure 
reflects  how  a  more  complex  system  might  be  represented. 

The  first  step  in  specifying  the  Move  Machine  is  representing  it  as  a  simple  instruction  interpreter 
(Figure  13).  At  this  level,  only  one  VSPEC  annotated  entity  describes  the  execution  of  each 
instruction  and  addressing  mode.  This  entity  contains  state  variables  to  store  the  current  register 
contents  and  the  value  of  the  instruction  pointer.  Thesensitive  to  clause  states  that  the  machine 
activates  when  its  start. or  reset  input  is  on  or  whenfifhe  value  of  the  instruction  pointer  changes. 
The  rather  complex  [ensures  clause  predicate  defines  how  the  machine  behaves  for  each  instruction 
and  addressing  mode.  An  external  entity  would  use  this  component  by  first  applying  the  reset 
signal  and  then  the  start  signal.  This  causes  the  machine  to  begin  executing  the  instruction 
in  memory  location  0.  The  result  of  each  instruction  (except  halt)  cause  the  contents  of  the 
instruction  pointer  fd  change  which  activates  the  machine  again  in  the  next  state.  This  continues 
until  a  halt  instruction  is  processed,  causing  the  machine  to  stop. 

Q§p  thing  to  note  about  this  specification  is  the  use  clause  on  the  first  line.  In  vhdl,  types 
and [functions  can  be  declared  in  Separate  packages.  These  packages  are  then  included  in  entity  and 
architecture  descriptions  with  the  use  clause.  The  mm_types  package  referenced  in  this  example  is 
shown  in  Figure  14.  An  interesting  aspect  of  this  package  is  the  use  of  incomplete  types  to  specify 


24 


270 


use  work. xnm_types  .all; 
entity  mm  is 

port  (reset, start  :  in  boolean; 

mem:  inout  memory) ; 
state  (ip  :  address; 

reg  :  regfile); 

sensitive  to  start  or  reset  or 
ip  ’  event ; 
ensures 

(reset  and  ip’ post  =  0)  or 

(not  reset  and 

((ins(mem(ip))  =  jump  and 
ip’post=addr(mem(ip) ) ) 

or  (ins (mem (ip))  =  load  and 
((am(mem(ip))  =  ab  and 
reg(mum(mem(ip)  ) )  ’post  = 
addr (mem (ip)))  qr!|f 
(am(mem(ip))  =  imm  and 
reg  (mum (mem (ip  )) )  ’post  ^1|1 
mem  (ip  +1))  :p>r 
(am (mem (ip) )  ii;  ind  and 
reg  (rnum  (mem  (  ip  ) ) )  ’post  =|||>| 
mem  (addr  (mem(  ip) )))  or  r 
(am (mem (ip:) )  =  rel  . 

reg  (mum  (mem  (ip) ) )  ’post  = 
mem (ip  +  addr (mem ( ip ) ) ) ) ) ) 


or  (ins(mem(ip))  =  store  and 
■  ((am (mem (ip))  =  ab  and 
;§;f  (  addr  ( m em ( ip ) ) )  ’post  = 

:  reg (rnum ( mem ( ip ) ) ) )  or 
(am (mem (ip))  =  imm  and 
mem  (ip  +1)  = 
r  eg  ( rnum  (mem  (  ip:)  ) ) )  or 
(am(mem(ip) )  =  indand 
mem(mem(addr(mem(ip))))  = 
reg(rnum(mem(ipi))))  or 
(am(mem(ip))  ==  rel  and 
mem  (ip  +  addr(mem(ip) ) )  = 
leg ( rnum (mem ( ip )))))) ) 

and  ((ins(mem(ip))  =  store  or 
;  ins (mem (ip))  =  load)  and 
((am (mem (ip) )  /=  imm  and 
ip’post  =  ip’post+1) 
x|br  (aih(mem(ip))  =  imm  and 
ip’post  =  ip’post+2)))) ; 
end  mm; 


Figure  13:  The  Move  Machine  requirements  represented  as  an  instruction  interpreter. 
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package  mm_types  is 

type  address;  ;:'  v 

type  word; 

includes  Instruction(word, address, integer) ; 
type  control  is  (fetch, decode, execute, halt); 
type  memory  is  array (0  to  256)  of  word; 
type  regfile  is  array (0  to  15)  of  word; 
end  mm_types; 

Figure  14:  Package  declaring:  types  used  in  the  Move  Machine. 

address  and  word,  vhdl  uses  incomplete  types,  to  allow  references  to  a  type  before  the  type  is 
completely  defined  (such  as  in  an  access  type).  One  use  of  this  is  to  allow  a  record  to  contain  a 
pointer  to  another  record  of  the  same  type  (i.e.  to  construct  a'l|st)ii||;: 

In  vspec,  incomplete  types  are  used  for  a  slightly  different  purpose.  The  type  definitions 
for  address  and  word  are  incomplete  because  no  implementation  is  defined.  They  are  declared 
to  be  types,  but  no  additional  information  is  provided.  These  incomplete  types  will  be  given 
characteristics  by  the  specification,  bufiao  specific  implementation  is  implied  or  mandated.  Thus, 
the  designer  must  select  an  implementation: sht  .fflower  abstraction  level.  Using  incomplete  types 
allows  the  designer  to  specify  a  type’s  characteristics  without  specifying  its  implementation. 

The  characteristics  of  the  address  and  word  typdis  are  defined  in  the  LSL  Instruction  trait. 
This  trait  is  included  in  mm_types  using  a  vsp.EC.  includes  clause  (see  Section  2)  and  the  trait  is 
shown  in  Figure  ;|;5.  The  Instruct  ion  trait  provides  definitions  for  conversion  functions  that  allow 
instructions,  register  numb  ersi  and  addresses  to  be  obtained  from  memory  words.  In  the  final  format 
of  the  Move  Machine  instructions  (riot  shown  in  this  paper),  this  would  be  implemented  by  defining 
which  bits  of :  ai: memory  word  encode  the  instruction,  register  number  and  address.  However,  when 
specifying  the  initial  requirements  of  the  device,  such  details  should  not  be  considered.  All  that 
must  be  specified  is  that  instructions,  register  numbers  and  addresses  can  be  obtained  from  memory 
words.  This  is  exactly  what  the  fpL  description  allows  us  to  say. 

Once  the:  Move  Machine’s  initial  requirements  are  defined,  the  device  can  be  broken  up  into  an 
abstract  architecture  and  eachjof  the  components  can  be  synthesized  individually.  For  a  CPU  such 
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Instruct ion(W , A , N) :  trait 

includes  v :\;|v  ; ?  . p 

Natural (N) 

mode  enumeration  of  abs,  imm,  ind,  rel 
instruction  enumeration  of  halt,  jump,  load,:  store 

introduces 

am:  W  — ►  mode 

addr:  W  A 

ins:  W  — +  instruction 

mum:  W  — ►  N 

Figure  15:  LSL  support  functions  for  treating  memory  contents  as  instructions.  Basic  types  and 
conversions  are  defined. 

as  the  Move  Machine,  one  such  architecture  is  the  canonical  fetch^decode-execute  structure.  An 
instruction  is  retrieved,  the  addressing  modes  are  decoded  and  dereferenced,  and  the  instruction  is 
executed  on  its  operands.  Effectively!  the  Move  Machine  is  now  three  components  that  execute  in 
sequence.  0 

Figure  16  shows  the  fetch-decode-execute  architecture  for  the  Move  Machine.  The  signals  mem, 
reg,  IP,  IR,  EA  and  CNTL  exchange  memory,  registers  and  control  values  between  components. 
The  requires  and  ensures  clauses  for  each  component  describe  transformations  performed  on 
memory  and  register  yhilues  while  the  sensitive  to  clauses  uses  the  control  value  indicates  what 
component(s)  should  be  active. 

Each  component’s  sensitive  to  clause  indicates  that  it  should  be  active  when  its  execution 
phase  begins.  As  witli  :ihb  instruction  interpreter,  the  machine  starts  by  turning  on  the  reset 
signal.  This  causes  the  fetch  component  to  activate  and  sets  the  instruction  pointer  to  0.  After 
reset  turns  off,  all  components  are  inactive  until  the  start  signal  is  asserted,  fetch’s  sensitive 
to  clause  is  the  only  sensitive  to  clause  satisfied  by  this  action,  so  fetch  is  the  only  component 
that  Activates.  All  other  components  have  no  affect  on  the  concrete  state  of  the  architecture.  The 
flllllllpFonen1;  retr^eves  the  current  instruction  from  memory  and  places  it  in  the  instruction 
register  (IR);  It  also  sets  the  chtl  signal  to  decode. 
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use  work .  mistypes .  all ; 
architecture  mm_f de  of  mm  is 
component  fetch 

port  (reset, start  :  in  boolean; 
mem:  in  memory; 
ip  :  inout  address; 
ir  :  out  word; 
cntl:  inout  control); 
end  component; 
component  decode 
port  (mem:  in  memory; 
ip:  in  address; 
ir:  in  word; 
ea:  out  address; 
cntl:  inout  control); 
end  component; 
component  execute 
port  (mem:  inout  memory; 

reg:  inout  registers; 
ea:  in  address; 
cntl:  inout  control); 
end  component; 


use  work. mm_ types. all; 
ent ity  decode  is ::  x 
port  (mem:  in  memory.; 
ip:  in  address; 
ir:  in  word; 

:,:|iea:  out  address; 
cntl:  inout  control) ; 
seriSitiye  to  cntl=decode ; 
modifies  ea , cntl ; 
requires  true; 

Jjensures 

1/  ((am(ir)  =  ab  and  ; 

ea,post=ad<ir(ir) )  or 
(am(ir)  =  ;amm  and 
ea’postfip+l)  or 
(am(ir)  =  ind  and 
:ea:Jpost=mem  (addr  (ir ) ) )  or 
(am(ir)  =  rel  and 
ea  ’  post=ip+addr  (ir)  )’). 
and  cntl }post=execute; 
end  decode; 


signal  CNTL:  control; 
signal  IP  :  address; 
signal  IR  :  word; 
signal  EA  :  address; 
signal  reg  :  regfile; 

begin 

bl:  fetch  port  map  (reset , start , 

mem, IP. IR, cntl) ; 

b2:  decode  port  map  (mem, IR,EA, CNTL)  ; 
b3:  execute  port  map  (mem^reg^EAjCNTL) ; 
end  mm_f  de ;  ’•  - = : 

use  work  ♦  mm.. types .  aj|-;: 
entity  fetch  is  :|||-  |||  ... 

port  (reset ,  start':;/:,  in  boolean; 
mem:  in  memory; 
ip  :  inout  . address; 
ir  :  out  word; 
cntl:  inout  control)  ;  ' 
sensitive  to  start  or  reset  ojr : 

cntl=fetch; 
modifies  ir,cntl; 
requires  true; 

ensures ; :  ‘ :  : 

(reset’  and  ip J  post  =  0) 
or|:(not  reset  and 
•  ir  ’  post=mem  (  ip )  $ 

.iliand  cntl’post=decode) ; 
end  fetch.; 


use  work. mm^types. all:; 
entity  execute  is  : 
port (mem:  inout  memory; 

:p;ip :  inout  address ; 

Ipi/ir:  in  word; 

reg:  inout  regfile; 
pill  ea:  in  address; 

cntl:  inout  control); 

•Sensitive  to  cntl=execute; 
modifies  mem , reg , ip , cntl ; 
requires  true ; 
ensures 

(insi|ir)  =  jump  and 
ip.5post=addr(ir)  and 
,post=f  etch) 

!!!||!|br  (ins(ir)  =  load  and 

reg(rnum(ir)) ,post=mem(ea)  and 
cntl  ’post=f etch  and 
( (am(ir)  =  imm  and 
ip’post  =  ip+2)  or 
(am(ir)  /=  imm  and 
ip’ post  =  ip+1))) 
or  (ins(ir)  =  store  and 

mem(ea)  ,post=reg(mum(ir) )  and 
cntl’post=fetch  and 
( (am(ir)  =  imm  and 
ip’post  =  ip+2)  or 
(am(ir)  /=  imm  and 
ip’ post  =  ip+2))) 
or  (ins(ir)  =  halt  and 
cntl ’ post=halt ) ; 
end  execute; 


fefch-decode-execute  architecture  for  the  Move  Machine  CPU 
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The  only  component  whose  sensitive  to  clause  is  satisfied  at  this  point  is  decode.  This 
component  calculates  the  effective  address  based  on  the  addressing  mode  specified  by  the  instruction 
in  the  IR  and  sets  the  cntl  signal  to  execute.  The  execute  component  then  manipulates  the 
registers  and  memory  based  on  the  current  instruction.  When  a  load,  store  or  jump  instruction 
is  executed,  execute  sets  the  cntl  signal  to  fetch  which  causes  the  fetch  component  to  activate 
and  the  process  starts  again.  If  the  halt  instruction  is  processed,  execute  sets  cntl  to  halt.  This 
makes  all  three  component’s  sensitive  to  clauses  false  and  the  concrete  state  of  the  architecture 
does  not  change  again  until  something  (such  as  activating  reset)  outside  of  mm  changes  it. 

5  Related  Work 

5.1  Software  Architecture 

The  research  area  most  closely  related  to  abstract  architecture  representation  in  vs  pec  is  software 
architecture  [8].  Research  in  this  field  has  led  to  the  development  of  several  architecture  description 
languages,  including  UniCon  [23],  Wright  [3,  4]  and  Rapide  [16,  17].  Each  of  these  languages 
allow  the  definition  of  components  and  connectors  to  define  a  software  architecture.  This  is  similar 
to  the  vhdl  notion  of  a  structural  architecture  described  in  this  paper. 

Shaw’s  UniCoh  language  [23]  is  one  example  of  an  architecture  description  language.  A  UniCon 
description  consists  of  component  and  connector  definitions.  Each  of  these  definitions  gives  the 
type  (such  as  Filter  or  Process. for  components  and  Pipe  or  FilelO  for  connectors),  association  units 
(component  players  and  connector  rbib||||pd  an  implementation  for  the  component  or  connector. 
The  primary  product  of  the  UniCon  compiler  is  Odinfiles,  something  similar  to  makefiles  that  can 
be  used  to  construct  executables  for  the  described  architecture.  Thus,  one  of  the  main  products  of 
a  UniCon  description  is  a  facility  .that  is  used  to  construct  an  executable  version  of  the  described 
architecture.  This  is  very  different  from  a  vs  pec  abstract  architecture  which  is  used  to  verify 
that  the  class  of  solutions  defined  by  the  architecture  implements  the  requirements  specified  by  the 
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VSPEC  description  of  the  component. 

The  Wright  architecture  description  language  [3, 4}by  Allen  and  Garlands  of  particular  interest 
when  discussing  abstract  architectures  in  vspec.  A  . Wright  description  Consists  of  a  collection 
of  components  interacting  via  instances  of  connector  types.  Each  part  of  a  Wright  description 
is  defined  using  a  variant  of  csp  [11].  Unlike  VSPEC’s  use  of  CSP  to  define  only  communications 
between  components,  Wright  descriptions  use  csp  to  define  the  behavior  of  components  as  well. 
Wright’s  csp  descriptions  define  the  sequence  of  events  that  occur  in  a  component  or  connector. 
Components  and  connectors  interact  when  one  component/connector  observes  an  event  provided 
by  another.  This  may  cause  the  second  component/connector  to  provide  events  that  cause  further 
interactions.  These  interactions  are  all  described  using  CSP. 

Rapide  [16,  17]  is  an  executable  architecture  description  language  designed  for  prototyping 
architectures  of  distributed  systems.  A  Rapide  architecture  consists  of  a  set  of  module  specifica¬ 
tions  (called  interfaces),  a  set  of  connection  rules  defining  communication  between  interfaces  and 
a  set  of  formal  constraints  that  define  legal  patterns  of  communication.  A  Rapide  architecture 
is  executed  to  produce  a  partially  ordered  set  of  events  (poset)  that  represents  the  dependencies 
between  events  in  the  /architecture.  The  Rapide  tools  can  then  verify  this  poset  does  not  violate 
the  formal  constraints  defined  illije  architecture.  M  major  difference  between  Rapide  and  vspec 
is  that  vspec  descriptions  are  not  executable.  They  are  intended  for  formal  analysis. 

5.2  Other  VHDL-ReMjbied;  Specification  Languages 

Odyssey  Research ? Associates  (ORA)  is  developing  Larch/VHDL,  an  alternative  Larch  interface 
language;:]®#  :Vh dl  [13]I*3  Laf|h7vH DL  is  Targeted  for  formal  analysis  of  a  vhdl  description  and 
ORA  is  defining  a  formal  semMtics  for  vhdl  using  lsl.  The  lsl  representations  are  used  in  a 
traditional  theorem  prover  to  vbnfy  system  correctness.  Larch/VHDL  annotations  are  added  to  a 
specific  SvHpiLJjdescription  to  represent  proof  obligations  for  the  verification  process.  In  contrast  to 
this,  a  VSPEC  abstract  architecture  represents  the  requirements  of  a  class  of  solutions  that  satisfy 
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a  specification  (also  given  in  vspec). 

Augustin  and  Luckham’s  val  [6]  is  another  attempt  to  annotate  VHDL.  The  purpose  of  a 
val  annotation  to  a  vhdl  description  is  to  document  the  design  for  verification,  val  provides 
mechanisms  for  mapping  a  behavioral  description  to  a  structured  description.  Two  val/vhdl 
descriptions  of  a  design  can  be  transformed  into  a  self-checking  vhdl  program  that  is  simulated 
to  verify  that  the  two  descriptions  implement  the  same  function.  This  differs  from  VSPEC  because 
it  does  not  allow  the  description  of  a  class  of  solutions  that  implement  a  specification.  Instead,  it 
allows  the  verification  that  a  structural  description  correctly  maps  to  a  behavioral  description  for 
the  entity. 

5.3  Larch  Interface  Languages 

Larch  interface  languages  have  been  developed  for  a  variety  of  programming  languages,  including 
LCL  [9],  Larch/C++  [15]  and  LM3  [14],  interface  languages  for  C,  C++  and  Modula-3,  respectively. 
Each  of  these  languages  allow  the  description  of  the  pre-  and  post-conditions  for  procedures  and 
functions  in  a  sequential  programming  language.  The  portions  of  these  languages  that  allow  this 
type  of  specification  (Le.  requires,  and  ensures  clauses)  are  also  found  in  VSPEC  where  they  are 
used  to  specify  the  transformation  performed  by  a  single  component.  However,  since  C,  C++  and 
Modula-3  are  sequential  languages,  their  Larch  interface  languages  do  not  have  to  deal  with  how  the 
Larch-specified  ^procedures  and  functions  interact  when  two  procedures  are  executing  concurrently 
as  is  the  case  with  vspec  entities.  At  the  present  time,  we  are  not  aware  of  other  work  in  the  Larch 
community  where  pre  and  post- conditions  are  used  to  specify  the  behavior  of  components  in  an 
abstract  architecture. 
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6  Conclusion 

6.1  Summary 

The  ability  to  evaluate  architectural  decisions  early  in  the  design  process  enhances  overall  design 
quality  by  allowing  architectural  errors  to  be  discovered  when  they  are  less  expensive  to  fix.  Un¬ 
fortunately,  vhdl  does  not  allow  evaluation  until  a  simulatable  model  exists.  For  many  complex 
systems,  simulatable  models  appear  late  in  the  design  process  making  architectural  errors  difficult 
to  correct.  An  alternative  to  simulation  for  evaluating  architectural  decisions  is  formal  analysis  of 
abstract  architectures  at  the  requirements  level.  An  abstract  architecture  is  a  set  of  interconnected 
components  where  the  requirements  of  each  component  are  known  but  the  implementation  is  not. 
This  paper  presented  vspec’s  support  for  .describing  and  evaluating  abstract  architectures  during 
requirements  specification. 

A  vspec  abstract  architecture  is  formed  by  instantiating  each  component  in  a  vhdl  structural 
architecture  with  a  VSPEC  entity.  The  VSPEC  description  of  an  entity  includes  a  pre-condition, 
post-condition  and  activation  condition  that  describe  the  entity’s  functional  requirements.  If  the 
current  state  of  the  system  satisfies  the  activation  condition  for  one  of  the  components  in  the 
abstract  architecture,  that  component’s  pre-condition  must  hold  and  the  component  must  satisfy 
its  post-conditiongin  the  next  state.  A  refinement  of  a  vspec  entity  can  be  compared  with  the 
vspec  specification  using  \yeak:  bisimulation.  If  some  sequence  of  state  changes  in  the  refinement 
yields  the  correct  single  state  change  in  the  higher-level  description,  weak  bisimulation  holds.  This 
method  can  be  u§ed  tqformally  determine  if  a  vspec  abstract  architecture  is  a  refinement  of  the 
vspec  description  of  the  entity,  it  implements. 

6.2  ;§  Status  and  Limitations 

VSPEC  provides  a  specification  capability  most  appropriate  for  high  levels  of  abstraction.  It  is 
anticipated  that  designers  will  represent  system  requirements  with  vspec,  gradually  refining  re- 
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quirements  into  architectures  and  eventually  a  vhdl  design.  During  requirements  specification 
when  a  designer  is  defining  the  essential  requirements  of  a  system,  vs  PEC  is  useful  for  evaluating 
the  impact  of  architectural  decisions.  When  design  details  are  available,  VHDl  simulation  is  a 
more  suitable  analysis  activity.  Although  vspec  can  model  design  detail,  formal  analysis  is  far  less 
pragmatic  than  VHDL  simulation  in  such  situations. 

A  potential  limitation  to  the  vspec  approach  is  verifying  the  refinement  of  vspec  require¬ 
ments  into  vhdl  design  representations.  Formalizing  the  fie  between  vspec  and  VHDL  to  support 
verification  and  comparison  with  simulation  results  is  the  subject  of  current  investigations.  In 
addition,  techniques  for  automatically  synthesizing  VHDL  from  VSPEC  are  currently  under  develop¬ 
ment  [21,  20].  Studies  of  error  analysis  reports  for  safety-critical  software  systems  suggest  that  over 


90%  of  safety  related  errors  arise  from  incorrect  or  incomplete  specifications,  not  transformation 
of  requirements  into  implementations  [18].  This  suggests  that  the  use  of  techniques  such  as  those 
proposed  here  are  warranted  even  before  a  complete  verification  path  between  vspec  and  vhdl 
exists. 
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Abstract 

VHDL  provides  a  means  for  operationally  defining 
behavior  of  digital  components  and  for  describing  com¬ 
position  of  components .  However ,  the  operational  and 
structural  specification  techniques  require  specification 


of  a  single  design  artifact  They  do  not  provide  an  ap¬ 
propriate  means  for  representing  requirements .  This 
paper  describes  VSPEC,  a  specification  language  used 
in  conjunction  with  VHDL  to  axiomatically  describing 
component  requirements .  vspec  supports  specifica¬ 
tion  of  constraints  and  performance  requirements' Ws 
well  as  the  function  of  a  component  Using  the  VHD B 
architecture  construct ,  vspec  can  also  specify  re¬ 
quirements  for  abstract  system  architectures. 

1  Intro  duction 

vspec  is  motivated  by:ihe  need  to  specify  digital 
system  requirements  in  an  implementation  indepen¬ 
dent  fashion.  Qualitatively,  system  requirements  spec¬ 
ify  “what”  a  system  should  achieve  without  specifying 
“how”  it  should  be  done.  Design  specifidaiions  are 


developed  from  requirements  and  describe  “how”  re¬ 
quirements  are  implemented.  Although  VHDL  [6]  sup¬ 
ports  specification  of  specific  designs,  it  does  little  to 
support  requirements  specification.  In  addition*  VHDL 
does  not  support  a  consistent  representation  of  con¬ 
straints. 

Lack  of  requirements  and  constraint  specification 
has  little  jpect  when  designing  systems  requiring  few 
levels  of  abstraction.  However,  there  growing  need 
for  systematic  design  of  very  large,  abstractly  defined 
systems:  i^hen  starting  from  extremely  high  levels  of 
abstra^ti^  structure  of  the  eventual  design  is  not 
reflected  in  requirements.  Thus  it  is  difficult  to  relate 
an  operational  specification  back  to  the  requirements 
it  is  to  exhibit.  In  sudfi^ituati^^pexplicit  require¬ 


ments  and  constraint  specification  allow  a  designer  to 
work  at  a  high  level  of  abstraction  without  interference 
from  the  details  bfibwer  levels. 

This  paper  describes  VSPEC,  an  extension  of  VHDL 
which  addresses  the  problem  of  representing  require¬ 


ditions 


explicitly: 


ments  explicitly.  In  the  remainder  of  this  paper, 
vhdl’s  method  of  design  specification  and  VSPEC’s  ad- 
ditionS- are  presented.  The  structure  of  vspec  and  its 
associated  formal  basis  are  presented.  How  vspec  and 

#HDL  can  be  used  to  specify  abstract  architectures  is 
presented  along  with  the  relationship  between  vspec 
and  algebraic  specification. 

gjL.1  vhdl  Design  Specification 
•  Specification  of  a  design  in  vhdl  involves  3  ba¬ 
sic  constructs:  (1)  the  entity  specifies  the  inter¬ 
face  of  a  system;  (2)  the  architecture  specifies  the 
behavior  and/or  structure  of  a  system;  and  (3)  the 
configuration  associates  a  specific  architecture 
with  an  entity.  The  designer  specifies  a  device  in¬ 
terface  using  the  entity  construct,  develops  one  or 
more  structural  or  behavioral  descriptions  using  the 
architecture  and  selects  a  specific  implementation 
for  the  entity  using  the  configuration  construct. 

Each  architecture  associated  with  an  entity  rep¬ 
resents  a  potential  design  at  some  level  of  abstraction. 
Structural  specifications  indicate  how  components  are 
composed  to  construct  a  solution.  Behavioral  speci¬ 
fications  describe  the  behavior  of  a  solution  using  an 
Ada-like  programming  language.  In  both  cases,  spe¬ 
cific  candidate  designs  are  represented.  A  specific  de¬ 
sign  is  selected  by  comparing  the  behavior  of  that  de¬ 
sign  with  the  set  of  system  requirements. 

Representation  of  system  requirements  in  vhdl  is 
restricted  to  an  operational  style  -  a  “program”  is  writ¬ 
ten  that  describes  an  artifact  having  desired  character¬ 
istics.  Although  the  operational  style  is  an  excellent 
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means  for  describing  specific  designs,  it  is  not  ideal  for 
describing  system  requirements  for  several  reasons. 

1.  It  forces  representation  of  a  specific  design,  thus 
introducing  implementational  bias. 

2.  It  does  not  adapt  easily  to  representation  of  per¬ 
formance  constraints. 

3.  Unimportant  characteristics  are  indistinguishable 
from  required  features  of  the  design. 

4.  Users  must  deal  with  unnecessary  detail. 

Figure  la  is  an  example  vhdl  entity  representing 
a  component  that  searches  a  collection  of  records  for 
a  specific  record.  Note  there  is  no  indication  of  what 
the  component  must  accomplish  or  what  performance 
constraints  exist  for  it.  The  result  is  a  black-box  view 
of  the  component  with  no  indication  of  requirements, 
as  shown  in  Figure  lb.  An  architecture  can  be  devel¬ 
oped,  but  such  an  architecture  exhibits  the  negative 
characteristics  discussed. 

2  vspec  Requirements  Specification 

A  solution  to  requirements  representation  in  VHDL 
is  vspec,  a  Larch  interface  language  [3]  developed  for 
vhdl  synthesis.  The  Larch  family  of  specification  lan¬ 
guages  consists  of  a  collection  of  application  specific 
interface  languages  and  a  common  shared  language. 
Each  interface  language  defines  sets  of  specification 
primitives  containing  useful  constructs  in  a  target  ap¬ 
plication  language.  The  shared  language  serves  two 
purposes.  First,  it  provides  a  target  formal  system 
for  translating  interface  specifications.  Second,  it  pro¬ 
vides  a  language  for  writing  auxiliary  specifications 
and  handbooks  of  common  Components; 

The  traditional  shared  language  is  a  first  order  alge¬ 
braic  language  call  lsl.  In  vspec,  the  primary  shared 
language  is  Refine  |ij||;:due  to  its  support  for  trans¬ 
formation  and  synthesfe|l|$  formal  basis,  and  its  po¬ 
tential  for  execution. 

Figure  2a  shows  the  vspec  representation  for  the 
same  search  as  the  VHDL  entity  in  Figure  1.  The 
added  clauses  specify  input  conditions,  output  condi¬ 
tions  and  constraints.  Figure  2b  shows  a  graphical 
representation of the  same  information.  The  vspec 
definition  indicates  that  Vcc  muii  beless  that  or  equal 
to  5  and  ;$i|iat  the  area  (r  x  y)  must  :be  less  than  0.3. 
No  constraints  are  place  on  heat  dissipation  (H),  clock 
speed  p|fk)  or  timing. 

Thcijspecification  associated  with  Figure  2  avoids 
many  of  the.  problems  with  the  operational  specifi¬ 
cation  style;  C,  A  search  routine  is  specified  indepen¬ 
dently  of  any  implementation  by  the  ensures  clause. 
Only  characteristics  necessary  for  specifying  a  search 


are  included.  Constraints  are  clearly  specified  in  the 
constrained  by  clause  and  do  not  interfere  with  the 
functional  specification.  The  designer  need  not  be  con¬ 
cerned  with  the  details  of  the  search  algorithm  at  the 
requirements  level. 

3  The  vspec  entity 

All  VSPEC  annotations  affect  only  the  VHDL  entity 
structure.  No  changes  are  made  to  architecture 
structures  or  any  other  vhdl  structure.  VSPEC  clauses 
are  grouped  into  four  broad  classes:  (1)  those  that  de¬ 
fine  a  devices  function;  (2)  those  that  define  internal 
state  variables;  (3)  those  that  define  constraints;  and 
(4)  those  that  relate  vhdl  data  structures  to  formal 
representations. 

3.1  vspec  Glauses  and  Logic 

vspec  is  a  collection  of  keywords  followed  by  logical 
sentences.  The  keywords  indicate  what  requirement 
each  logical  sentence  specifies.  Each  logical  setence  is 
written  in  typed  first-order  predicate  calculus.  Exten¬ 
sions  to  the  logic  allow  use  of  sets  and  sequences  in 
specifications.  The  logic  follows  the  basic  syntax  of 
Refine  ,  the  language  used  for  system  synthesis,  to 
Support  easy  translation  and  some  degree  of  execution. 

There  are  six  basic  VSPEC  clauses: 


VSPEC  clauses  may  only  access  variables  and  sig¬ 
nals  defined  in  an  entity  port,  the  state  clause  or 
quantified  in  a  logical  expression.  VSPEC  is  strongly 
typed  and  all  variables  must  have  an  associated  type, 
including  those  bound  by  quantifiers.  Although  Re¬ 
fine  allows  type  inferencing,  VSPEC  does  not. 

All  vspec  clauses  are  optional.  Only  the  based  on 
clause  may  appear  more  than  once  in  an  entity.  The 
format  of  the  requires,  ensures,  and  constrained 
by  clauses  is  a  keyword  followed  by  a  logical  expression 
and  a  semicolon. 

<keyword>  <logical-expression> 


-  requires  -  specifies  sufficient  conditions  on  in¬ 
puts  and  state  for  entity  execution 

-  ensures  -  specifies  necessary  conditions  on  out¬ 
puts  and  state  following  entity  execution 

ri;  constrained  by  -  specifies  non-functional  per¬ 
formance  constraints 

-  modifies  -  specifies  what  the  entity  may  alter 

-  based  on  -  associates  vhdl  data  types  with  Re¬ 
fine  definitions 

-  state  -  defines  a  collections  of  variables  that  rep¬ 
resent  the  entity’s  internal  state 
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entity  search  is 

port  (input:  in  array  of  element; 
k:  in  keytype; 
output:  out  element); 
end  search; 


Component 


a)  b) 

Figure  1:  A  vhdl  entity  describing  a  record  search. 
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The  format  of  the  state  and  modifies  clauses  is  a 
keyword  followed  by  a  collection  of  variables,  option¬ 
ally  typed. 

<keyword>  <variable>[,  <variable>] 

The  format  of  the  based  on  clause  is  a  type  name 
followed  by  the  based  on  keyword  and  a  logical  ex¬ 
pression. 

<type>  based  on  <logical~expression> 

3.2  Functional  Requirements 

The  functional  requirements  of  a  VSPEC  entity  are 
defined  using  the  requires  and  ensures  clauses.  The 
requires  clause  specifies  a  logical  expression,  I(xj,: 
that  must  be  true  for  the  entity  to  perform  its  op¬ 
eration.  The  ensures  clause  specifies  necessary  state 
conditions,  0(x,z),  resulting  from  entity  execution 
given  a  particular  input.  Formally,  any  function  im¬ 
plementing  an  entity  must  obey  the  condition: 

Vx:D.I(x)^0[x,F(x))  (1) 

3.2.1  The  requires  Clause 

The  requires  clause,  I(x ),  is  a  logical  expression  de¬ 
fined  over  all  ports,  signals  and  variables  that  may 
provide  input  to  the  transform.  J(z)  is  true  when  z  is 
a  valid  input.  J(z)  is  a  precondition  for  entity  execu¬ 
tion.  When  it  is  true,  the  entity  must  produce  valid 
output. 

3.2.2  The  ensures  Olax^e; 

The  ensures  clause,  0($||§;  is  a  logical  expression 
defined  over  all  ports,  signals  and  variables.  0(x,z)  is 
true  when  2  is  a  valid  opput  given  x  as  input.  0(z,  |) 
is  a  postcondition  for  entity  execution  and  states  nec¬ 
essary  conditions  placed  on  entity  outputs  and  state 
variables .  ■  W"  ■ I 

3.3  Constraints 

Constraints  express  characteristics  an  entity  must 
exhibit  that  are  not  a  part  of  its  function.  For  exam¬ 
ple,  heat  dissij^iph;^^^  frequently  affecl  se¬ 

lection  of  valid  designs,  but  heat  is  a  side  effect  pf  the 
technology.;::;!!  has  little  to  do  with  input  and  output 
relationship. 

Although  constraints  do  not  affect 'function,  they 
are  critical  in  hardware  system  design.;  In  vspec 
there||i|||^Qv sources  of  constraint.  The  first  is  the 
constrained  by  clause  that  specifies  several  perfor¬ 
mance  constants  common  in  hardware  design.  The 
second  is  the  modif  ies  clause  that  limits  what  the 
entity  can  alter  in  performing: its function. 


3.3.1  The  constrained  by  Clause 

The  constrained  by  clause  is  a  conjunction  of  prede¬ 
fined  variables  and  relations  with  fixed  values.  VSPEC 
currently  supports;  providing  constraint  information 
for  heat  dissipation,  area,  clock  speed,  power  con¬ 
sumption  and  pin-to-pin  timing;  To  specify  con¬ 
straint,  one  chooses  a  constraint  type  and  uses  it  in 
a  relation.  For  example,  to  specify  heat  dissipation 
less  than  1  watt  and  power  consumption  less  than  10 
watts,  the  logical  sentence  heat  =<  1  and  power  =< 
10  is  included  in  the  constrained  by  clause. 

Timing  requires  a  somewhat  more  complicated  rep¬ 
resentation.  Here  one  specifies  an  interval  between  two 
pins,  then  relates  that  interval  to  a  constant  time.  For 
example,  (a<->b)  =<  10  specifies  that  the  time  be¬ 
tween  a  signal  arriving  at  port  a  and  port  b  producing 
a  signal  must  be  lass  than  10. 

3.3.2  The  modif  ies  Clause 

The  modifies  clause  specifies  a  collection  of  ports, 
signals  and  variables  that  may  be  modified  by  the  en¬ 
tity.  The  modif  ies  clause  indicates  what  effects  and 
side  effects  are  allowed.  Only  outputs  may  be  speci¬ 
fied  in  a  modifies  clause.  Of  particular  interest  is  the 
ability  to  specify  the  direction  of  buffer  type  ports. 

3.4  Abstract  Data  Types 

The  semantics  of  vhdl  data  types  must  be  defined 
|||^re  Reasoning  about  their  properties  is  possible.  El¬ 
emental  data  types  such  as  integer  and  bit  have  def¬ 
initions  loaded  as  a  part  of  the  vspec  system.  Thus, 
when  using  a  basic  VHDL  type,  the  semantics  of  that 
type  are  present  by  default. 

3.4.1  The  based  on  Clause 

User  defined  data  types  such  as  arrays  and  records 
must  be  defined  as  a  part  of  the  definition  process 
because  they  cannot  be  defined  a  priori .  This  is  ac¬ 
complished  using  the  based  on  predicate.  The  log¬ 
ical  expression  defined  in  a  based  on  clause  defines 
the  semantics  of  a  user  defined  type.  To  support  this 
specification  process,  vspec  include  standard  schemas 
for  defining  sets,  sequences,  arrays  and  tuples.  These 
schemas  are  used  in  conjunction  with  parameter  mor¬ 
phism  to  define  associated  vhdl  types  specific  to  user 
needs. 

3.5  System  State 

The  notion  of  system  state  is  typically  not  sup¬ 
ported  directly  by  axiomatic  specification  techniques. 
A  computation  unit  is  defined  by  a  transform  that  re¬ 
lates  inputs  to  outputs.  Thus,  to  include  state  in  a 
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specification  it  must  be  specified  as  an  input  to  the 
transform.  However,  specification  of  state-based  sys¬ 
tems  is  natural  to  hardware  designers  and  suggest¬ 
ing  that  state  representation  be  an  input  to  the  VHDL 
entity  is  not  natural.  Using  the  two-tiered  specifica¬ 
tion  approach  state  can  be  managed  by:  (a)  support¬ 
ing  the  definition  of  local  state  variables;  and  (b)  using 
state  maintaining  features  of  port  signals.  Instead  of 
specifying  a  function  that  maps  input  signals  defined 
in  the  port  definition  to  outputs  in  the  same  port  def¬ 
inition,  specify  a  function  that  maps  inputs  and  state 
maintaining  objects  to  outputs  and  state  maintaining  1 
objects. 

3.5.1  The  state  clause 

The  state  clause  is  a  collection  of  variables  that  store 
state  within  a  VSPEC  entity.  Like  VHDL  variables 
and  signals,  these  variables  maintain  their  values  from 
one  invocation  of  the  entity.  All  state  variables  are 
defined  locally  and  are  not  visible  outside  the  entity. 

3.5.2  Ports 

Variables  defined  an  entity’s  port  definition  may 
maintain  their  state.  Variables  of  type  buffer  may 
be  inputs  or  outputs  and  are  not  re-initialized  unless 
a  signal  of  some  type  is  driving  them.  Variables  of 
type  out  and  inout  also  maintain  their  state. 

4  Generic  Architectures  in  VSPEC 

VSPEC  supports  representation  pf  high  level,  ab¬ 
stract  architectures  using  the  architecture  construct 
from  VHDL.  No  modifications  or  annotations  are  nec¬ 
essary  -  simply  specify  entity  structures  accessed  by 
the  architecture  usinjg  VSPEC. 


Figure  3  represents  a  two  component  architec¬ 
ture  for  solving  the  element  search  problem.  The 
architecture  batch- seq  represents  a  two  step  solu¬ 
tion  of  sorting'the  input  list  and  using  a  binary  search 
to  find  the  desired  record.  Although  the  requirements 
of  the  sorting  algorithm  are  specified,  no  algorithm  is 
presented.  Thusr  the: designer  may  instantiate  the  sort 
with  any  appropriate  ^Igonthm..  Application  of  such 
an  architecture  represents  an  iterative  refinement  pro¬ 
cess  commiip  to  design  activities.  Additionally,  VSPEC 
is  adept  at  representing  such  refinemdntsfwhere  an  op¬ 
erational language  may  fall  short. 

5  VSPEC  and  Algebraic  Specification 

Any  VSPEC;  definition  can  be  trahfformed  into  a 
formal  definition.  The  form  this  definition  is  an  al¬ 
gebraic  specification  based  on  an  extension  of  domain 


theories  as  defined  in  cypress  [7]  and  kids  [9,  8].  The 
basic  form  of  a  domain  theory  is  a  tuple  consisting  of 
the  function  domain  (D),  range  (R),  input  precondi¬ 
tion  (I(x:D))  and  output  postcondition  (0(x:D,z:R)) 
commonly  referred  to  as  a  DRW  model.  The  DRIO 
mbdel  for  any  VSPEC  entity  can  be  constructed  using 
the  following  rules: 

x  x  ...  x  tn  where  tk  isithe  sort  represent¬ 
ing  the  type  associated  with  an  in,  inout,  or 
buff  exports,  or  a  state  variable 

R  =  ti  x  few  . . .  x  tm  where  tj  is  the  sort  represent¬ 
ing  the  type  associated  with  an  out,  inout,  or 
buffer  port  listed  in  the  modifies  clause,  or  a 
lllllpl t|$e  variable  listed  in  the  modifies  clause 

J(x T))  ~.'Iv(x :  D)  where  Iv(x  :  D)  is  the  logical 
sentence  defined  by  the  requires  clause 

0(x  :  D , z  :  R)  =  Oy(x  :  D, z  :  R)  where  Ov(x  :  D,z  : 
R)  is  the  logical  sentence  defined  by  the  ensures 
clause 

::f ;  Additionally,  constraints  must  be  defined  as  a  part 
of  jjiie  algebraic  statement.  The  simplest  means  of  ac¬ 
complishing  this  is  to  simply  include  predicates  rep¬ 
resenting  constraints  in  the  output  function  of  the 
DRIO.  However,  constraints  are  not  functional.  Spec¬ 
ifying  constraints  in  their  own  clause  is  an  attempt  to 
separate  constraint  from  function.  Additionally,  con¬ 
straints  in  their  current  form  do  not  depend  on  vari¬ 
ables  defined  in  the  entity1.  Thus,  constraints  are 
added  to  the  DRIO  model  through  a  specification  mor¬ 
phism  that  adds  logical  representations  of  constraints. 
The  DRIO  model  becomes  a  DRIOC  model. 

C(c\  :  C\ , . . . ,  cn  :  Cn)  =  Cv  (ci  :  C\ , . , . ,  cn  :  Cn ) 
where  c&  is  a  constraint  variable  such  as  heat  or 
area,  C*.  is  a  sort  associated  with  a  constraint 
variable  and  Cv  is  the  logical  expression  defined 
in  the  constrained  by  clause 

The  goal  of  the  design  activity  is  to  find  and  archi¬ 
tecture  that  performs  the  transform  F  :  D  R  such 
that: 

Vx:D.I{x)=>0{x,F{x))  (2) 

Thus,  the  goal  of  the  synthesis  activity  is  generation 
of  a  transform  mapping  the  current  state  and  inputs 
into  the  next  state  and  outputs  such  that  the  output 
condition  is  satisfied. 

more  complex  constraint  model  could  certainly  include 
variables  and  signals.  Our  current  constraint  model  does  not 
allow  this. 
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architecture  bat-seq  of  search  is 
component  sorter 

port  (input:  in  array  of  element; 

output:  out  array  of  element); 
component  bin_search 

port  (input:  in  array  of  element; 
key:  in  integer; 
value :  out  element) ; 

begin 

bl:  cl  port  map(x,y); 
b2:  c2  port  map(y,z); 
end  bat-seq; 


entity  sort  : is 

port  (input:  in  array  of  element; 

output :  out  array  of  element) ; 
modifies  output ; 

ensures  bag ( input )  =  bag (output )  and 
sorted (output) 
end  sort;  ' 

entity  b in_ s e ar ch  is 

port  (input:  buffer  array  of  element; 
k:  in  integer; 
value:  out  element); 
modifies  out; 
requires  sorted ( input ) ; 
ensures  ; 

(fa  e ; element ) 

output  ;=  e  <=>  key(e)=k  and 
e  in  input 

end  bin_ search; 


Figure  3:  VSPEC  representation  of  a  search  architecture  using  a  batch  sequential  approach.  The  original  list  is 
sorted  and  a  binary  search  finds  the  desired  object  from  the  resulting  list. 


6  Related  Work 

As  vspec  is  a  Larch  interface  language  for  vhdl  it 
borrows  from  the  construction  of  other  interface  lan¬ 
guages.  Specifically,  VSPEC  is  styled  after  the  LM3 
Larch  interface  language  for  Modula-3  [5].  Odyssey 
Research  Associates  is  currently  developing  an  alter¬ 
native  Larch  interface  language  for  vhdl  [4],  This  lan¬ 
guage  does  not  support  representation  of  constraints 
and  is  targeted  for  formal  analysis  rather  than  synthe¬ 
sis.  ORA’s  interface  language  also  differs  in  its  imple¬ 
mentation  of  time.  An  absolute  time  based  temporal 
logic  is  used  in  specifying  the  function  of  an  entity. 
Thus  one  can  specify  tijat  a  predicate  becomes  true  at 
a  specific  time  using  the  notation  \^P(ar)@£”. 

Another  attempt  to  annotate  VHDL  is  val  [2],  val 
annotates  all  aspects  of  the  VHDL  design.  All  signals 
in  the  namespace  of  the  VHDL  representation  are  in 
the  namespace  of  the  VAL  annotation.  Thus,  VAL  an¬ 
notates  specific  VHDL  designs  rather  than  represent 
requirements.  ORA’s  interface  language  is  similar  in 
this  respect |  but  does  support  separate  requirements 
definition!?1' 

7  ^|||ure  Work 

Current  VSPEC  research  involves  pursuing  domain 
specific  support? for  specification  activities  and  sup¬ 
port  for  formal  synthesis.  An  important  aspect  of  any 
Larch  language  is  its  associated  handbook.  A  hand¬ 


book  is  simply  a  collection  of  reusable  theories  defined 
|iin  the  shared  language.  Handbook  theories  represent 
commonly  used  structures,  algorithms  and  character¬ 
istics  as  well  as  domain  specific  information.  For  VHDL 
we  are  implementing  theories  to  represent  standard 
VHDL  types,  low  level  logic  functions  and  conversion 
routines.  In  addition,  we  are  working  on  libraries  to 
support  specifications  involving  signal  attributes  such 
as  event,  stable,  and  delay.  Theories  for  pin-to- 
pin  timing,  heat  dissipation,  power  consumption,  area 
and  clock  speed  have  been  implemented  to  support 
constraint  checking  during  the  design  process. 

The  isomorphic  relationship  between  vspec  and  al¬ 
gebraic  specifications  is  being  used  to  exploit  work  in 
formal  synthesis,  specifically,  developing  morphisms 
between  algorithms  [10].  This  involves  development 
and  implementation  of  theories  useful  in  constructing 
multicomponent  systems  such  as  the  batch  sequential 
search  algorithm  appearing  earlier  in  this  paper. 
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Abstract 

Systems  engineering  of  computer-based  systems 
demands  explicit  representation  of  functional  re¬ 
quirements  as  well  as  constraints  at  each  level  of 
design  abstraction.  However,  traditional  design  rep¬ 
resentation  languages  such  as  vhdl  and  VERILOG 
do  not  support  requirements  representation  indepen¬ 
dent  from  implementation.  This  work  presents  a 
axiomatic  specification  language  designed  to  sup¬ 
port  requirements  representation.  VSPEC  annotates 
VHDL  entity  structures  supporting  declarative  spec¬ 
ification  of  input  preconditions ,  output  postcondi¬ 
tions  and  performance  constraints  as  a  part  of  the 
design  representation.  The  declarative  nature  of  the 
specification  supports  requirements  definition  inde¬ 
pendent  of  design  representation. 

1  Introduction 

It  is  commonly  understood  that  engineering  is 
a  requirements  driven  activity.  Problem  require¬ 
ments  are  stated  and  the  engineering  goal  is  to 
produce  an  artifact  satisfying  those  requirements. 
Requirements  can  be  broadly  categorized  into  two 
classes:  (1)  functional  requirements;  and  (2)  con¬ 
straints.  Although  the  distinction  between  these 
two  classes  is  frequently  debated,  functional  require¬ 
ments  describe  the  intended  transformation  from 
input  to  output  while  constraints  describe  other 
non-functional  restrictions  placed  on  the  solution. 
Both  functional  requirements  and  constraints  must 
be  represented  and  accounted  for  in  a  successful  sys¬ 
tems  engineering  activity. 

vhdl  [1]  is  a  widely  accepted  design  specification 
language  for  digital  systems.  It  supports  represen¬ 
tation  of  artifacts  at  multiple  levels  of  abstraction 


as  well  as  providing  both  behavioral  and  structural 
descriptions.  Unfortunately,  VHDL  supports  only 
an  operational  specification  style  and  provides  no 
standard  means  for  representing  constraints.  Thus, 
when  used  at  the  requirements  level,  VHDL  forces 
the  user  to  make  implementation  decisions  early  in 
the  design  process.  As  the  desired  result  of  require¬ 
ments  analysis  is  a  description  of  “what”  without 
regard  to  “how”,  VHDL  is  not  an  appropriate  re¬ 
quirements  representation  language.  In  addition, 
constraint  information  frequently  used  to  choose 
between  design  alternatives  is  not  explicitly  repre¬ 
sented. 

VSPEC  is  a  Larch [2]  interface  language  for  vhdl 
that  supports  declarative  specification  of  both  func¬ 
tional  requirements  and  constraints.  VSPEC  defines 
functional  requirements  using  an  input  precondition 
and  output  postcondition  defined  over  the  ports  and 
internal  state  of  a  VHDL  entity.  VSPEC  defines  con¬ 
straint  information  using  standard  representations 
of  heat  dissipation,  clock  speed,  delay  time,  area 
and  power  consumption  limits.  In  addition,  other 
constraint  types  may  be  defined  by  the  user. 

This  paper  describes  the  vspec  language  and 
how  it  is  used  to  define  systems  level  requirements. 
A  brief  presentation  of  VHDL  is  given  and  problems 
identified.  The  basic  structure  of  vspec  is  then  de¬ 
scribed  followed  by  specifics  of  language  constructs. 
Also  presented  is  a  means  for  using  VSPEC  and 
structural  VHDL  to  define  high-level  architectures, 
thus  supporting  high  level  decomposition.  Finally, 
the  role  of  VSPEC  in  the  design  process  is  shown 
along  with  examples  of  its  use. 
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2  VHDL  Design  Specification 

Specification  of  a  design  in  VHDL  involves  3  basic 
constructs:  (1)  the  entity  specifies  the  interface  of 
a  system;  (2)  the  architecture  specifies  the  be¬ 
havior  and/or  structure  of  a  system;  and  (3)  the 
configuration  associates  a  specific  architecture 
with  an  entity.  The  designer  specifies  a  device  in¬ 
terface  using  the  entity  construct,  develops  one  or 
more  behavioral  or  structural  descriptions  using  the 
architecture  and  selects  a  specific  implementation 
for  the  entity  using  the  configuration  construct. 

Each  architecture  associated  with  an  entity 
represents  a  potential  design  at  some  level  of  ab¬ 
straction.  Behavioral  specifications  describe  the 
behavior  of  a  solution  using  an  Ada-like  program¬ 
ming  language.  Structural  specifications  indicate 
how  components  are  composed  to  construct  a  solu¬ 
tion.  In  both  cases,  specific  candidate  designs  are 
represented.  A  specific  design  is  selected  by  com¬ 
paring  the  behavior  of  that  design  with  the  set  of 
system  requirements. 

Representation  of  system  requirements  in  VHDL 
is  restricted  to  an  operational  style  -  a  “program” 
is  written  that  describes  an  artifact  having  desired 
characteristics.  Although  the  operational  style  is 
an  excellent  means  for  describing  specific  designs, 
it  is  not  ideal  for  describing  system  requirements 
for  several  reasons. 


1.  It  forces  representation  of  a  specific  design, 
thus  introducing  implement  at  ional  bias. 

2.  It  does  not  adapt  easily  to  representation  of 
performance  constraints. 

3.  Implementation/representation  specific  details 
are  indistinguishable  from  required  features  of 
the  design. 

4.  Users  must  deal  with  unnecessary  detail. 


Figure  la  is  an  example  VHDL  entity  repre¬ 
senting  a  component  that  searches  a  collection  of 
records  for  a  specific  record.  Note  there  is  no  indi¬ 
cation  of  what  the  component  must  accomplish  or 
what  performance  constraints  exist  for  it.  The  re¬ 
sult  is  a  black-box  view  of  the  component  with  no 
indication  of  requirements,  as  shown  in  Figure  lb. 
An  architecture  can  be  developed,  but  such  an  ar¬ 
chitecture  exhibits  the  negative  characteristics  dis¬ 
cussed. 


entity  search  is 

port  (input:  in  array  of  element; 
k:  in  keytype; 
output:  out  element); 
end  search; 


element 

array 

key 


a) 

search 


??? 


element 


b) 


Figure  1:  A  VHDL  entity  describing  a  record 
search.  Note  that  the  entity  defines  only  the  in¬ 
terface.  The  architecture  describes  the  function 
operationally. 


3  VSPEC  Requirements  Specification 

A  solution  to  requirements  representation  in 
vhdl  is  vspec,  a  two-tiered  specification  language 
developed  for  VHDL  synthesis.  VSPEC  is  designed 
using  concepts  developed  for  Larch  [2]  interface  lan¬ 
guages  for  software  specification.  The  Larch  family 
of  specification  languages  consists  of  a  collection  of 
application  specific  interface  languages  and  a  com¬ 
mon  shared  language.  Each  interface  language  de¬ 
fines  sets  of  specification  primitives  containing  use¬ 
ful  constructs  in  a  target  application  language.  The 
shared  language  serves  two  purposes.  First,  it  pro¬ 
vides  a  target  formal  system  for  translating  inter¬ 
face  specifications.  Second,  it  provides  a  language 
for  writing  auxiliary  specifications  and  handbooks 
of  common  components. 

The  traditional  shared  language  is  a  first  order 
algebraic  language  call  the  Larch  Shared  Language 
(lsl)  [3].  In  vspec,  the  primary  shared  language  is 
Refine[4,  5],  due  to  its  support  for  transformation 
and  synthesis,  its  formal  basis,  and  its  potential  for 
execution. 

Figure  2a  shows  the  vspec  representation  for  the 
same  search  as  the  vhdl  entity  in  Figure  1.  The 
added  clauses  specify  input  conditions,  output  con¬ 
ditions  and  constraints.  Figure  2b  shows  a  graphical 
representation  of  the  same  information.  The  vspec 
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definition  indicates  that  power  consumption  must 
be  less  that  or  equal  to  5  mW  and  that  the  size 
(x  x  y )  must  be  less  than  5  x  3 fim2.  No  constraints 
are  place  on  heat  dissipation  (H),  clock  speed  (Clk) 
or  timing. 


entity  search  is 

port  (input:  in  array  of  element; 
k:  in  keytype; 
output:  out  element); 
modifies  output; 
requires  true; 
ensures 

output  =  e  <=>  key(e)=k  and 
e  in  input 

constrained  by 
power  <=  5  mW  and 
size  <=  3  urn  *  5  um 
end  search; 


a) 


element 

array 

key 


H 


search 


element 


b) 


Figure  2:  A  VSPEC  entity  describing  a  record 
search.  The  functional  requirements  and  con¬ 
straints  are  explicitly  represented  as  a  part  of  the 
entity  construct. 


The  specification  associated  with  Figure  2  avoids 
many  of  the  problems  with  the  operational  speci¬ 
fication  style.  A  search  routine  is  specified  inde¬ 
pendently  of  any  implementation  by  the  ensures 
clause.  Only  characteristics  necessary  for  specify¬ 
ing  a  search  are  included.  Constraints  are  clearly 
specified  in  the  constrained  by  clause  and  do  not 
interfere  with  the  functional  specification.  The  de¬ 
signer  need  not  be  concerned  with  the  details  of  the 
search  algorithm  at  the  requirements  level. 


4  The  VSPEC  entity 

All  VSPEC  annotations  affect  only  the  VHDL 
entity  structure.  No  changes  are  made  to 
architecture  structures  or  any  other  VHDL  struc¬ 
ture.  VSPEC  clauses  are  grouped  into  four  broad 
classes:  (1)  those  that  define  a  devices  function;  (2) 
those  that  define  internal  state  variables;  (3)  those 
that  define  constraints;  and  (4)  those  that  relate 
VHDL  data  structures  to  formal  representations. 

4*1  VSPEC  Clauses  and  Logic 

vspec  is  a  collection  of  keywords  followed  by  log¬ 
ical  sentences.  The  keywords  indicate  what  require¬ 
ment  each  logical  sentence  specifies.  Each  logical 
sentence  is  written  in  typed  first-order  predicate  cal¬ 
culus.  Extensions  to  the  logic  allow  use  of  sets  and 
sequences  in  specifications.  The  only  variables  al¬ 
lowed  in  each  clause  are:  (1)  ports;  (2)  variables 
defined  in  the  entitys  state  clause;  and  (3)  vari¬ 
ables  defined  by  quantifiers  in  the  sentence.  Both 
port  and  state  variables  are  assumed  to  be  univer¬ 
sally  quantified.  The  only  exception  to  this  rule  is 
the  constrained  by  clause  where  variables  defined 
in  constraint  theories  are  used  exclusively. 

There  are  six  basic  VSPEC  clauses: 

-  requires  -  specifies  sufficient  conditions  on  in¬ 
puts  and  state  for  entity  execution 

-  ensures  -  specifies  necessary  conditions  on 
outputs  and  state  following  entity  execution 

-  constrained  by  -  specifies  non-functional 
performance  constraints 

-  modifies  -  specifies  what  the  entity  may  al¬ 
ter 

-  based  on  -  associates  VHDL  data  types  with 
Refine  definitions 

-  state  -  defines  a  collections  of  variables  that 
represent  the  entity’s  internal  state 

-  includes  -  specifies  that  a  shared  language  file 
containing  data  types  and  functions  is  used  in 
the  definition 

-  assumes  -  specifies  assumptions  made  in  defin¬ 
ing  the  device1 

vspec  clauses  may  only  access  variables  and  sig¬ 
nals  defined  in  an  entity  port,  the  state  clause  or 
quantified  in  a  logical  expression.  VSPEC  is  strongly 
typed  and  all  variables  must  have  an  associated 

1  This  clause  is  not  implemented  in  the  current  language 
parser,  but  will  be  included  in  a  later  release 
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type,  including  those  bound  by  quantifiers.  Al¬ 
though  Refine  allows  type  inferencing,  vspec  does 
not. 

Logical  statements  in  vspec  are  designed  to 
mimic  as  much  as  possible  the  syntax  of  vhdl. 
This  supports  ease  of  use  by  VHDL  users  and 
achieves  the  language  specific  goals  of  a  Larch  in¬ 
terface  language.  For  example,  numerical  constants 
follow  vhdl  format,  logical  connectives  use  their 
English  names,  and  predicates  defined  on  signals 
follow  the  <signal>'<property>  convention  de¬ 
fined  for  VHDL.  This  changes  the  standard  Larch 
<variable>  *  representation  for  the  post  execution 
value  of  <variable>  to  <variable>'post. 

4.2  State-Based  Specification 

The  VSPEC  model  uses  a  classic  state-based  spec¬ 
ification  approach.  The  notion  of  system  state  is 
typically  not  supported  directly  by  axiomatic  spec¬ 
ification  techniques.  A  computation  unit  is  defined 
by  a  transform  that  relates  inputs  to  outputs.  Thus, 
to  include  state  in  a  specification  it  must  be  speci¬ 
fied  as  an  input  to  the  transform.  However,  specifi¬ 
cation  of  state-based  systems  is  natural  to  hardware 
designers  and  suggesting  that  state  representation 
be  an  input  to  the  VHDL  entity  is  not  natural.  Us¬ 
ing  the  two-tiered  specification  approach  state  can 
be  managed  by:  (a)  supporting  the  definition  of 
local  state  variables;  and  (b)  using  state  maintain¬ 
ing  features  of  port  signals.  Instead  of  specifying 
a  function  that  maps  input  signals  defined  in  the 
port  definition  to  outputs  in  the  same  port  defini¬ 
tion,  specify  a  function  that  maps  inputs  and  state 
maintaining  objects  to  outputs  and  state  maintain¬ 
ing  objects. 


inputs 

entity 

l(x) 

r  F  F 

■I  0(x,z) 

J  c 

outputs 

L  s  - 

Figure  3:  State-based  specification  model  that 
forms  the  basis  of  VSPEC  requirements  definition. 

The  goal  is  specifying  a  function  that  accepts  In¬ 
put  values  and  the  current  state  and  generates  out¬ 
put  and  a  new  state.  To  achieve  this,  VSPEC  spec¬ 
ifies  an  input  precondition  over  inputs  and  state, 
and  an  output  postcondition  over  outputs  and  state. 


Figure  3  shows  these  relationships  graphically.  F  is 
the  function  of  the  component,  S  stores  the  inter¬ 
nal  state,  and  C  defines  constraints.  I(x)  defines  a 
precondition  on  inputs  and  state  while  0(x ,  z)  de¬ 
fines  a  postcondition  on  outputs  and  state  given  an 
input.  Finally,  C(c )  defines  a  set  of  constraints  the 
device  must  operate  under. 

A  device’s  interface  is  defined  by  the  VHDL 
entity  construct.  VSPEC  uses  these  definitions  in 
its  clauses  to  reference  these  signals  rather  than  re¬ 
defining  the  interface.  VSPEC  defines  a  devices  func¬ 
tion  by  providing  S  and  stating  I(x)  and  0(x,z). 
Finally,  VSPEC  defines  constraints  by  defining  pred¬ 
icates  over  c,  a  constraint  variable  set. 

4.3  Internal  State 

The  state  clause  defines  a  collection  of  variables 
and  initial  values  defining  the  internal  state  of  a 
component.  These  variables  are  not  visible  outside 
the  entity.  State  variables  maintain  their  values 
between  entity  invocations.  As  with  any  VSPEC 
symbol,  the  undecorated  state  variable  name  indi¬ 
cates  the  value  before  invocation  and  the  name  dec¬ 
orated  with  'post  indicates  the  value  after  invoca¬ 
tion.  Thus,  values  before  and  after  invocation  are 
accessible  in  the  same  definition. 

It  is  important  to  note  that  VHDL  ports  also 
maintain  their  values  between  entity  invocations. 
However,  ports  are  visible  outside  the  entity  and 
need  not  be  defined  in  the  state  clause.  The 
state  clause  defines  only  new  variables  necessary 
for  internal  state  components.  It  is  possible  (even 
common)  for  components  having  no  state  clause  to 
be  state  based  using  only  port  values  as  state.  The 
entire  state  of  a  component  is  the  complete  set  of 
state  variables  and  ports.  Like  state  variables,  the 
'post  attribute  supports  accessing  both  a  port’s 
pre-invocation  and  post-invocation  values. 

4.4  Functional  Requirements 

The  functional  requirements  of  a  VSPEC  entity 
are  defined  using  the  requires  and  ensures 
clauses.  The  domain,  D ,  of  F  is  the  set  of  all  finite 
vectors  consisting  of:  (1)  ports  providing  input;  and 
(2)  state  variables.  The  range,  R,  of  F  is  the  set 
of  all  finite  vectors  consisting  of:  (1)  ports  generat¬ 
ing  output;  and  (2)  state  variables.  The  direction 
indicators  used  in  VHDL  port  definitions  and  the 
modifies  clause  determine  what  ports  and  state 
variables  are  appropriate  for  D  and  R .  Note  that 
a  port  or  state  variable  may  appear  in  both  D  and 
R. 
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The  requires  clause  specifies  a  logical  expres¬ 
sion,  /(x),  that  must  be  true  for  the  entity  to  per¬ 
form  its  operation.  The  vector  x  is  an  element  of 
D.  The  ensures  clause  specifies  necessary  post¬ 
conditions,  0(x,£),  resulting  from  entity  execu¬ 
tion  given  a  particular  input.  The  vector  z  is  an 
element  of  R.  Any  function,  F,  implementing  an 
entity  must  obey  the  condition  specified  in  Equa¬ 
tion  1.  The  pre-  and  post-conditions,  I  and  0,  de¬ 
fined  by  the  ensures  and  requires  clauses  repre¬ 
sent  the  entity’s  functional  requirements. 

Vx  :  D  •  J(x)  =>■  0(x,  F(x))  (1) 

Equation  1  defines  a  synthesis  goal  considering 
only  functional  requirements. 

4.5  Constraints 

Constraints  express  characteristics  an  entity 
must  exhibit  that  are  not  a  part  of  its  function.  For 
example,  heat  dissipation  constraints  frequently  af¬ 
fect  selection  of  valid  designs,  but  heat  is  a  side  ef¬ 
fect  of  the  technology.  It  has  little  to  do  with  input 
and  output  relationships. 

Although  constraints  do  not  affect  function,  they 
are  critical  in  system  design.  In  vspec,  two  clauses 
are  used  to  represent  constraints.  The  first  is  the 
constrained  by  clause  that  specifies  several  per¬ 
formance  constraints  common  in  hardware  design. 
The  second  is  the  modifies  clause  that  limits  what 
the  entity  can  alter  in  performing  its  function. 
The  constrained  by  clause  is  a  conjunction  of 
predicates  defined  over  a  constraint  variable  set,  c. 
Adding  constraints  to  Equation  1  results  in  the  new 
synthesis  goal  for  F  shown  in  Equation  2.  Note  that 
C(c)  is  the  conjunction  of  predicates  specified  in  the 
constrained  by  clause. 

Vx  :  D  •  I(x)  =>  0(x,  F(x))  A  C(c)  (2) 

Equation  2  defines  a  more  realistic  synthesis  goal 
adding  constraints  to  the  functional  requirements. 
The  variables  in  c  are  defined  by  underlying  con¬ 
straint  theories  and  are  not  defined  as  a  part  of 
each  entity.  When  specifying  an  entity,  con¬ 
straint  variables  are  inherited  from  the  underlying 
constraint  theory.  The  current  default  constraint 
set  supports  representation  of  power  consumption, 
heat  dissipation,  clock  speed,  pin-to-pin  timing  and 
area.  Users  may  define  additional  constraints  as 
needed  using  Refine  to  define  theories.  The  new 
theory  is  added  using  the  includes  clause  to  load 
the  definition. 


4.6  Data  Types 

The  semantics  of  VHDL  data  types  must  be  de¬ 
fined  before  reasoning  about  their  properties  is  pos¬ 
sible.  Elemental  data  types  such  as  integer  and 
bit  have  definitions  loaded  as  a  part  of  the  VSPEC 
system.  Thus,  when  using  a  basic  VHDL  type, 
the  semantics  of  that  type  are  present  by  default. 
vspec  generates  formal  definitions  of  RECORD  and 
ARRAY  types  using  standard  tuple  and  sequence 
constructs  from  Refine. 

5  Architectures  in  VSPEC 

vspec  supports  representation  of  high  level,  ab¬ 
stract  architectures  using  the  architecture  con¬ 
struct  from  vhdl.  A  high-level  architecture  is  a 
collection  of  interconnected  component  definitions. 
Each  component  is  instantiated  appropriately  for 
a  given  problem.  High-level  architectures  provide 
skeletal  solutions  for  commonly  used  system  archi¬ 
tectures  -  their  use  is  fundamental  in  complex  sys¬ 
tem  design.  Taking  a  single  vspec  entity  and  us¬ 
ing  a  vhdl  configuration  statement  to  assign  a 
high-level  architecture  to  it  supports  incremental 
design  activities. 

Structural  vhdl  defines  systems  by  indicating  in¬ 
terconnection  between  components.  Within  a  struc¬ 
tural  vhdl  architecture,  components  are  iden¬ 
tified  and  generic  parameters  instantiated.  These 
components  are  then  used  to  produce  a  netlist  spec¬ 
ifying  component  interconnection.  This  intercon¬ 
nection  specification  is  declarative  because  it  simply 
specifies  what  components  are  used  and  how  then 
are  connected.  Rather  than  extend  the  structural 
VHDL  architecture  to  represent  high-level  archi¬ 
tectures,  VSPEC  uses  it  to  define  interconnections 
between  specified  components.  VSPEC  provides  re¬ 
quirements  definitions  for  any  or  all  components  in 
the  architecture. 

Figure  4  represents  a  two  component  architec¬ 
ture  for  solving  the  element  search  problem  specified 
earlier.  The  architecture  bat ch-seq  represents  a 
two  step  solution  of  sorting  the  input  list  and  using 
a  binary  search  to  find  the  desired  record. 

The  architecture  references  two  components, 
a  sorter  and  a  bin.search.  In  typical  structural 
VHDL,  structural  or  behavioral  descriptions  exist 
for  each  component  either  decomposing  the  solu¬ 
tion  further  or  describing  a  behavioral  solution.  If 
the  entity  representation  for  each  component  is 
annotated  with  vspec,  a  third  option  is  possible. 
No  architecture  is  associated  with  either  entity, 
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thus  specifying  only  component  requirements.  Now 
three  specification  options  exist:  (1)  requirements 
for  each  component  may  be  specified  using  VSPEC; 
(2)  the  implementation  of  each  component  may  be 
specified  using  structural  vhdl;  or  (3)  the  behavior 
of  each  component  may  be  specified  using  behav¬ 
ioral  VHDL.  Realistically,  all  three  will  be  used  at 
any  given  time  due  to  varying  stages  of  component 
design. 

Although  each  component’s  requirements  are 
specified,  no  component  algorithms  or  assemblies 
are  presented.  However,  this  new  requirements 
specification  exists  at  a  lower  level  of  abstraction, 
because  some  structural  detail  has  been  added,  ex¬ 
cluding  some  potential  solutions  and  decreasing  the 
overall  abstraction  level.  Application  of  such  an 
architecture  represents  an  incremental  refinement 
process  common  to  design  activities.  By  assign¬ 
ing  bat-seq  to  the  entity  from  Figure  2,  using  a 
configuration  statement,  a  requirements  decom¬ 
position  is  performed.  The  resulting  architecture 
specifies  requirements  and  interconnections  for  com¬ 
ponents  and  an  obligation  exists  to  verify  the  re¬ 
sulting  decomposition  is  correct  with  respect  to  the 
entity’s  original  requirements. 

In  addition  to  functional  requirements,  con¬ 
straints  play  a  large  role  in  the  architecture  spec¬ 
ification.  Constraints  are  also  “decomposed”  across 
collections  of  components.  The  simplest  example 
of  this  activity  is  budgeting  power  consumption, 
weight  or  heat  dissipation.  When  budgeting,  a  frac¬ 
tion  of  the  value  being  constrained  is  assigned  to 
each  component  in  such  a  way  that  the  initial  con¬ 
straint  is  met.  With  heat  dissipation  and  power, 
the  sum  of  component  constraint  limits  must  not 
exceed  the  initial  constraint  limit. 

Although  budgeting  is  common  and  useful,  not 
all  constraints  can  be  managed  in  this  straight¬ 
forward  fashion.  Maintainability,  reliability,  and 
reuseability  are  examples  of  constraints  that  cannot 
be  budgeted  across  component  collections.  How¬ 
ever,  the  methodology  continues  to  apply  when  a 
constraint  model  is  developed  and  used  to  deter¬ 
mine  when  the  decomposition  meets  the  initial  con¬ 
straint  limit.  Although  developing  a  safety  metric, 
for  example,  may  be  a  difficult  task,  if  one  is  devel¬ 
oped,  it  can  be  incorporated  easily  into  the  vspec 
model. 

Module  fan-out  is  an  example  maintainability 
constraint  that  cannot  be  budgeted.  Fan-out  is 
the  number  of  modules  a  single  module  decomposes 


architecture  bat-seq  of  search  is 
component  sorter 

port  (input:  in  array  of  element; 

output:  out  array  of  element); 
component  bin_search 
port  (input:  in  array  of  element; 
key:  in  integer; 
value:  out  element); 

begin 

bl:  cl  port  map(x,y); 
b2:  c2  port  map(y,z); 
end  bat-seq; 

entity  sort  is 

port  (input:  in  array  of  element; 

output:  out  array  of  element); 
modifies  output; 

ensures  bag (input)  =  bag (output)  and 
sorted (output) 
constrained  by 
power  <=  3  mW  and 
size  <=  1  urn  *  2  um 
end  sort; 

entity  bin_search  is 

port  (input:  buffer  array  of  element; 
k:  in  integer; 
value:  out  element); 
modifies  out; 
requires  sorted (input ) ; 
ensures 

(forall  e: element) 
output  =  e  <=>  key(e)=k  and 
e  in  input 

constrained  by 
power  <=  1  mW  and 
size  <=  1  um  *  2  um 
end  bin_search; 


Figure  4:  vspec  representation  of  a  search  architec¬ 
ture  using  a  batch  sequential  approach.  The  origi¬ 
nal  list  is  sorted  and  a  binary  search  finds  the  de¬ 
sired  object  from  the  resulting  list. 
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into.  If  fan-out  is  high,  then  the  complexity  of  the 
decomposition  may  be  too  high  to  manage  effec¬ 
tively.  A  vspec  model  of  fan-out  adds  a  fanout 
predicate  to  the  constrained  by  clause.  Specify¬ 
ing  fanout  (f)  <  10  says  the  fan-out  of  the  compo¬ 
nent  must  be  less  than  10.  The  underlying  fan-out 
theory  expresses  that  fan-out  is  the  number  of  sub- 
modules  a  component  has.  This  provides  a  means 
for  checking  fan-out  in  an  evolving  system. 

6  Design  Process 

Using  VSPEC  and  the  VHDL  architecture  incre¬ 
mental  design  results  in  a  tree  generated  by  special¬ 
ization  activities.  Consider  the  earlier  search  prob¬ 
lem.  In  this  design  activity,  the  initial  requirements 
are  shown  in  Figure  2.  These  requirements  com¬ 
pletely  define  the  design  problem  specifying  both 
function  and  constraint. 

When  the  high-level  architecture,  bat-seq  (Fig¬ 
ure  4)  is  associated  with  the  initial  requirements,  an 
incremental  design  decision  is  represented.  This  de¬ 
cision  represents  initial  problem  decomposition  into 
interconnected  search  and  sort  components.  These 
components  each  have  their  associated  requirements 
and  constraints.  At  this  point  in  the  design  process, 
explicit  constraint  representation  allows  the  user  to 
check  constraints.  Namely,  that  power  does  not  ex¬ 
ceed  5  mW  and  size  does  not  exceed  15  um.  Naive 
constraint  theories  indicate  that  constraint  budgets 
do  not  exceed  high  level  constraints.  Without  ex¬ 
plicit  representation,  such  verification  would  not  be 
possible.  Although  these  theories  are  naive,  more 
realistic  theories  are  easily  encoded  as  Refine  spec¬ 
ifications. 

Functional  requirements  are  also  checked  using 
pre-  and  post-condition  comparison.  In  this  case, 
I  and  O  from  the  architecture  match  their  corre¬ 
sponding  specifications  in  the  system  description. 
Unfortunately,  this  will  rarely  be  the  case,  thus  re¬ 
quiring  more  complex  checks.  However,  the  require¬ 
ments  are  represented  explicitly  in  the  design  rep¬ 
resentation  and  are  available  for  verification. 

Assume  finally  that  each  component  is  expressed 
using  behavioral  VHDL  and  fabricated  resulting  in 
two  hardware  components.  Fabrication  results 
may  be  verified  independently  with  some  confidence 
their  composition  will  meet  requirements.  Addi¬ 
tionally,  if  constraints  cannot  be  met,  trade-off  de¬ 
cisions  may  be  explored  and  verified  within  the  con¬ 
text  of  the  entire  problem. 


7  Related  Work 

7.1  Larch 

vspec  is  based  on  Larch’s  two-tiered  specifica¬ 
tion  approach  and  is  a  Larch  Interface  Language. 
vspec  differs  from  existing  Larch  languages  in  its 
use  of  Refine  as  its  shared  language.  The  Larch 
Shared  Language  [3]  is  a  first  order,  algebraic  lan¬ 
guage  while  Refine  is  a  broad-spectrum  language 
that  is  both  executable  and  formal.  Refine  is 
used  because  its  environment  supports  software 
synthesis  while  Larch  is  primarily  useful  for  veri¬ 
fication.  vspec ?s  syntax  is  derived  primarily  from 
the  Modula-3  interface  language,  LM3  [6]. 

7.2  VHDL  Annotation  Language  (VAL) 

vspec  is  frequently  compared  to  the  VHDL  An¬ 
notation  Language  (val)  [7].  val  is  an  annotation 
language  used  to  describe  pre-  and  post-conditions 
on  VHDL  input  and  output  streams.  In  this  respect, 
VAL  and  vspec  are  quite  similar.  However,  sev¬ 
eral  critical  differences  exist.  First,  val  annotations 
translate  into  VHDL  assert  statements.  An  assert 
statement  is  a  boolean  valued  function  that  causes 
an  event  to  occur  when  triggered.  The  assert  is 
much  like  an  exception  in  a  traditional  program¬ 
ming  language  and  is  used  for  similar  purposes. 
Once  transformed  into  assert  statements,  the  val 
model  is  simulated  on  input  streams  and  the  result 
compared  to  simulation  of  VHDL  code  for  the  same 
module.  VSPEC  has  support  for  execution,  but  this 
is  not  its  primary  purpose.  The  logic  used  is  not  re¬ 
stricted  to  an  executable  subset.  More  importantly, 
the  logic  can  be  manipulated  formally. 

VAL  supports  annotation  of  behavioral  and  struc¬ 
tural  vhdl  as  well  as  the  entity  structure.  Thus, 
val  is  an  annotation  language  or  design  descrip¬ 
tion  language  rather  than  strictly  a  requirements 
language. 

Finally,  val  does  not  support  constraint  repre¬ 
sentation  or  checking.  In  the  systems  engineering 
environment,  constraints  are  frequently  more  dif¬ 
ficult  to  meet  than  functional  requirements.  Fur¬ 
thermore,  they  must  be  recorded  as  a  part  of  any 
requirements  specification. 

7.3  ORA’s  Larch/ VHDL 

ORA  is  currently  developing  a  Larch/ VHDL  in¬ 
terface  language.  [8]  In  many  respects,  this  language 
is  similar  to  val  in  its  attempt  to  model  entire 
systems  rather  than  simply  modeling  requirements. 
This  language  is  manipulated  formally,  thus  it  is  be¬ 
ing  used  to  define  a  semantic  model  for  VHDLLike 
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val,  ORA’s  interface  language  supports  only  tim¬ 
ing  constraints  and  it’s  usefulness  is  therefore  lim¬ 
ited  in  the  systems  engineering  area.  VSPEC  differs 
substantially,  supporting  only  requirements  specifi¬ 
cation  and  including  both  function  and  constraint. 
vspec  also  models  timing  as  a  constraint  where 
ORA’s  language  uses  a  temporal  logic  to  model  tim¬ 
ing  attributes. 

8  Current  Status  and  Future  Direc¬ 
tions 

Currently,  an  initial  Language  Reference  Man¬ 
ual  for  VSPEC  is  being  developed.  From  the  VSPEC 
LRM,  a  vspec  parser  and  partial  type  checker  have 
been  developed  using  the  Dialect  component  of 
the  Software  Refinery[4].  This  parser  is  avail¬ 
able  via  the  world  wide  web  and  ftp. 

This  version  of  VSPEC  is  limited  to  representing 
digital  information  as  is  vhdl.  Plans  exist  to  com¬ 
bine  vspec  with  the  AnaVHDL  work  underway 
at  the  University  of  Cincinnati.  AnaVHDL  sup¬ 
ports  specification  of  both  analog  and  digital  com¬ 
ponents  in  the  same  system.  As  vspec  is  declar¬ 
ative  and  most  circuit  specifications  are  specified 
using  equations,  this  combination  is  quite  natural. 
Open  and  interesting  problems  include  interfaces 
between  analog  and  digital  components  and  recon¬ 
ciliation  of  timing  information  from  the  digital  and 
analog  worlds. 

9  Summary 

vspec  is  a  Larch  interface  language  for  vhdl 
designed  to  represent  design  requirements  for  syn¬ 
thesis  activities,  vspec  design  goals  center  on: 

(1)  requirements  representation  independent  of  im¬ 
plementation;  and  (2)  constraint  representation. 
vspec  adds  declarative  components  that  describe 
a  component’s  functional  requirements  and  con¬ 
straints.  Axiomatic  specifications  describe  func¬ 
tional  requirements  by  defining  input  pre-conditions 
and  output  post-conditions.  Predicates  defined 
over  constraint  variables  describe  component  con¬ 
straints.  vspec  supports  descriptions  of  high-level 
architectures  using  structural  VHDL  and  allows  in¬ 
cremental  design  step  representation. 
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Abstract 

Prototyping  composite  hardware/software  systems 
requires  synthesis  of  hardware,  software  and  commu¬ 
nications  protocols .  Capabilities  exisfWM  synthesize 
ASIC  designs  from  a  Pascal-like  behavioral  VHDL 
subset  and  capabilities  are  developing  for  transforming 
the  same  VHDL  subset  into  standard  software  devel¬ 
opment  languages.  However,  the  process  of  synthesiz¬ 
ing  behavioral  VHDL  from  systems  level  requirements 
has  not  been  addressed .  Users  are  required  to  write 
behavioral  VHDL  descriptions  of  their  components  in 
a  purely  operational  manner .  This  results  in  imple- 
mentational  bias  and  premature  hardware/software  al¬ 
location  decisions.  We  propose  automating  this  pro¬ 
cess  by  expressing  systems  levdl  requirements  in  a 
declarative  specification  language Wtitd  using  standard 
software  synthesis  techniques  to  generate  behavioral 
VHDL  from  them. 

1  Introduction  ::| 

The  overall  goal  of  this  research  is  synthesis  of  com¬ 
posite  computing  systems  using  traditional  software 
synthesis  techniques;  ^:  composite  comp  ut ing  system 
is  defined  as  a  collection  oFicompiitation  units  that 
may  be  implemented  either  software  or;  hardware  com# 
ponents.  To  achieve  this  end,  the  high-leVel  approach 
described  in  Figure  1. .................  . 

The  general  flqw^  through  the  sys¬ 

tem  is  as  follows:  (a)  Design  requirements  (includ¬ 
ing  constraints)  are  parsed  to  generate  a  decoiated 
abstract  syntax  tree  used  by  synthesis  processes;  fb) 
the  problem  may  be  decomposed  into;  components;  (c) 
an  algorithm  is  synthesized  for  each '  component;  and 
(d)  The  Assemblages  of  components,  the  general  algo- 
rithms|and  abstract  syntax  tree  are  transformed  into 
an  appropriate  design  representation.  |Given  this  de¬ 
sign  raethoddlbgy,  this  research  is  decomposed  into  the 
following  sub-goals:  jij 

1.  Representation  ^ilystem:;  And., cqrhponent  require¬ 


ments. 

2.  Generation  of  an  intermediate  form  to  support 
synthesis 

3;  Synthesis  of -component  designs 

Generation  of  output  in  an  appropriate  design 
representation  language 


4. 


VSPEC/ 

VHDL 


Figure  1:  Flow  of  information  through  the  synthesis 
process 

This  paper  deals  primarily  with  our  specification 
language,  called  VSPEC,  and  the  methods  used  to 
synthesize  algorithms  from  requirements  suitable  for 
use  in  behavioral  VHDL.  VSPEC  describes  computa¬ 
tion  units  axiomatically,  specifying  an  input  precon¬ 
dition  and  an  output  post  condition.1  VSPEC  is  a 

1The  process  of  parsing  VSPEC  to  generate  the  appropriate 
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Larch  [5]  interface  language  for  VHDL  that  translates 
both  into  the  Larch  Shared  Language  [5]  and  the  high 
level  programming  language  Refine.  The  transfor¬ 
mation  of  VSPEC  into  Refine  expresses  the  require¬ 
ments  in  an  independent  form  suitable  for  use  by  var¬ 
ious  software  synthesis  tools  including  KIDS  {14]  and 
BENTON  [2]. 

1.1  Experimental  Domain 

Our  current  domain  is  rapid  prototyping  of  digi¬ 
tal  signal  processing  systems.  This  work  is  directed 
towards  automated  synthesis  of  board-  and  MCM- 
level  signal  processors  from  systems  level  acquire¬ 
ments.  This  synthesis  domain  includes  ASICs,  off- 
the-shelf  components  including  CPUs,  and  embedded! 
software.  !|| 

The  design  representation  language  for  this  effort 
is  mandated  to  be  VHDL  for  hardware  components 
and  C  for  software  components.  In  addition,  all  soft¬ 
ware  components  will  be  specified  in  VHDL  first,  then 
transformed  into  C  as  required  by  the  sponsoring 
agency.  Selection  of  VHDL  is  due  to  the  domain's  het¬ 
erogeneous  nature  and  the  United  States  Department 
of  Defense  acceptance  of  VHDL  as  a  systems  repre¬ 
sentation  language.  Selection  of  C  is  due  to  the  ready 
availability  of  C  compilers  for  off-the-shelf  digital  sig¬ 
nal  processors  and  existing  capabilities  for  performing 
VHDL  to  C  transformations. 

The  general  approach  is  synthesis  of  VHDL  to  rep¬ 
resent  both  hardware  and  software  components.  Ca¬ 
pabilities  currently  exist  for  transforming  a  rich  Subset 
of  behavioral  VHDL  into  RTL  level  VHDL  suitable  for 
synthesis  and  fabrication  [11].  Capabilities  also  exist 
for  transforming  behavioral  VHDL  into  compilable  0 
code.  Thus,  we  can  achieve  our  objective  by  taking 
a  requirements  description  of  a  system,  transform  the 
requirements  description  into  behavioral  VHDL  and 
synthesize  hardware  and  software;  components. 

1.2  Axiomatic  Specification 

Specifying  computatiohpnits  using  axiomatic  spec¬ 
ifications  involves  defining  a  transforri|i|y  specifying 
an  input  precondition  And  an  output  postcondition. 
Given  the  input  precondition  holds,  fhe  -  trans^rhi: 
must  guarantee  that  the  output  postcondition  is  made 
true.  Smith  [13]  a|pecification  be 

an  algebra  specifying  the  domain-  range,  input  pre¬ 
condition  and  output  postcondition,  ;Thus,  a  function: 
such  as  in  Figure  2,  may  be  described  in  terms  of  its 
domain  (D),  range  (R),  input  precondition  ||c|||itt:d 
output  postconditjgg^^^^^^When  I(x)  holds  for 
some  input  x  of. ‘type;  £>,!the  procedure  must  rpurn 
some  element,. £ : of  type  R  such  that  0(x,z)  holds.  A 
function  |j!S=  z  satisfies  this  specification  when  for 
any  x  suchi&hat  I(x)  holds,  F(x)  generates  z  such  that 
0(x,z)  holds.  Formally: 


Vx  ;  D  A  F(x)  =  z  =>  3z  :  R  •  0{x,  z)  (1) 


This  vfprk  relies  on  the  assumption  that  hardware 
components  may  be  specified  in  the  same  manner. 
Specifically,  that  the  transform  associated  with  a  hard¬ 
ware  component  can  be  defined  by  an  appropriately 
selected  domain,  range,  input  precondition  and  output 
postcondition.  A  second  assumption  is ; that  such  ax- 


used  to  synthesize  hard- 
assumption,  that  hard- 


iomatic  specifications  can  be  r 
ware  components.  The  first* 
ware  can  be  specified  axiomatically,  is  easily  made  and 
is  commonly  used  in  formal  verification  of  hardware. 
The  second  assumption  is  made  based  on  the  similarity 
between  behavioral  specification  and  traditional  pro¬ 
gramming.  The  process  component  of  VHDL  sup¬ 
ports  specification  of  behavior  using  an  Ada-like  lan¬ 
guage.  If  requirements  for  Ada  programs  can  be  syn¬ 
thesized  from  requirements,  then  it  stands  to  reason 
that  VHE>L  programs  can.  Semantically,  VHDL  and 
Ada  differ  substantially  -  the  bulk  of  this  paper  ad- 
idr^sgsliisQme  of  those  differences. 


internal  representation  is -a  simple  compiler  problem.  The  pro¬ 
cess  of  generating  Hefx$E:  [l]  algorithms  is 

a  simple  lateral  transformation!. 


2  * 

VSPEC  is  a  Larch  interface  language  [5]  for  VHDL. 
The  VSPEC  interface  language  annotates  the  VHDL 
entity  structure  adding  component  requirements  in 
terms  of  precondition,  postcondition,  performance 
constraints  and  sthte.  Each  structure  in  the  VSPEC 
interface  language:  translates  into  a  formal  definition 
in  a  shared  language.  VSPEC  differs  from  a  typical 
Larch  ;iht!erfacis:  language  in  that  the  primary  shared 
iaiiguage  is  Refine  rather  than  the  Larch  Shared  Lan- 
ijguage  (The  reasons  for  this  difference  will  be  discussed 
inter).  To  understand  the  VSPEC  language,  one  must 
first  have  a  cursory  understanding  of  how  VHDL  rep¬ 
resents  systems. 

2.1  VHDL 

!  VHDL  [9]  is  a  specification  language  for  digital  sys¬ 
tems  whose  structure  and  appearance  is  similar  to 
Ada -{1 5].  Although  this  structural  similarity  exists, 
it  ^somewhat  deceiving  because  the  semantics  of  a 
VHDL  specification  differ  substantially  from  a  simi¬ 
larly  structured  Ada  program. 

j  A  system  is  described  in  VHDL  by  describing 
|its  constituent  components  and  relationships  between 
them.  VHDL  specifications  consist  of  three  fundamen¬ 
tal  construct  types:  (a)  entity  constructs  describing 
component  interfaces;  (b)  one  or  more  architecture 
constructs  describing  each  component’s  behavior  or 
structure;  and  (c)  configuration  constructs  as¬ 
sociating  entities  with  specific  architectures. 
Thus,  an  entity  represents  an  interface,  several 
architectures  represent  behavior  and  structure,  and 
a  configuration  indicates  a  specific  architecture  to 
represent  the  behavior  of  a  component  for  a  specific 
design  task. 

2.1.1  Entity  Structures 

An  entity  specifies  the  interface  of  each  VHDL  com¬ 
ponent  much  as  an  Ada  public  declaration  specifies 
the  interface  of  a  procedure.  The  entity  construct 
names  the  component  and  defines  its  ports.  Ports 
are  the  hardware  equivalent  of  parameters  and  rep¬ 
resent  the  inputs  and  outputs,  their  types,  and  the 
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function  F(x:D)  :  R 
begin 
I(x) 

—  Function  Body 
0(x,z) 
return  z 
end; 

a)  b) 

Figure  2:  Axiomatic  descriptions  of:  (a)  a  typical  procedure; , and  (b)  a  typical  hardware  component. 


architecture  behavior  of  sr_latch  is 
begin 

q  <=  NOT  (qb  AND  s); 

;  iqb  <=  NOT  (q  AND  r) ; 
eiid  behavior; 


Figure  4:  A  VHDL  architecture  describing  the  be¬ 
havior  of  an  S-R  latch. 


entity  sr_latch  is 

port  (s,r :  in  bit;  q,qb:  buffer  bit); 
end  sr_latch; 


direction  of  data  flow.  VHDL  entity  structures  ar£ 
connected  by  connecting  theory  ports.  Figure  3  is  an 
entity  describing  a  simple  S-R  latch.  Note  that  the 
entity  describes  only  the  component  interface,  not  its 
behavioral  requirements  or  constraints. 


Figure  3:  A  VHDL  entity  describing  the  interface  to 
an  S-R  latch. 

Parameters  defined  in  the  port  definition  are  re¬ 
ferred  to  as  signals .  Signals  in  VHDL  are  very  siimlar 
to  variables  and  parameters  in  a  traditional  program¬ 
ming  language.  Variables  also  exist  in  VHDL  locally 
to  processes,  however  in  this  work  signal  assignment  is 
assumed  to  include  variable  assignment.  When  defin¬ 
ing  the  behavior  of  a  VHDL  entity,  relationships  be¬ 
tween  input  and  output  ports  are  defined,  much  as  re¬ 
lationships  between  input  and:  output  parameters  are 
defined  in  a  traditional  programming  language. 

2.1.2  Behavioral  Specification 

VHDL  supports  specification  of  a  component’s  be¬ 
havior  directly  using  an  operational  description  lan¬ 
guage,  or  indirectly  using  an  assembly  of  other  com¬ 
ponents.  Behavioral  specificafioriiimvolves  writing  a 
VHDL  “program”  in  an  operational:  VHDL  subset 
similar  in  appearance  to  Ada.  This  subset  includes 
familiar  control  structures  and  data  types  standard  in 
procedural  programming  languages  as  well  as  signal 
assignment  and  synchronization  constructs  necessary 
to  naturally  .  specify  hardware  components.  Figure  4 
shows  a  behpibral  description  df  the  S-R  latch. 

The  means  of  specifying  behavior  used  in  Fig¬ 
ure  4  inydlves  concurrent  signal  assignment  state¬ 
ments.  ::|he  values  of  q  and  qb  are  updated  using  the 
“<=”  si|nal  assignment  operator.  In  thils  -specification, 
value  ;;a^ignmen t  to  q  and  qb  occurs  simultaneously. 
Thus,  the  first  assignment  statement  -does  not  alter 
the  program  state  .prior  to  evaluating  maintaining  the 
original  value  of  qi;  for.  the  second  assignment  state¬ 
ment.  The  VHDL  code  to  generate  the  assigned 


value  is  a  driver .  There  should  exist  one  and  only  one 
driver  for  each  output  signal. 

An  alternative  specification  involves  the  use  of 
process  blocks.  In  a  process,  assignments  do  not  oc- 
#cur  simultaneously  and  statements  execute  in  a  man¬ 
ner  similar  to  a  traditional  programming  language. 
Thus,  the  VHDL  fragment  from  Figure  5  requires  the 
introduction  of  a  temporary  variable  as  is  traditional 
in  an  imperative  language.  Note  that  the  two  behav¬ 
iors  specified  using  concurrent  assignments  and  pro¬ 
cesses  specify  identical  behaviors. 


Figure  5:  A  VHDL  architecture  describing  the  be¬ 
havior  of  an  S-R  latch  using  a  single  process. 

If  multiple  processes  exist  in  an  architecture,  all 
processes  execute  simultaneously.  Thus,  concurrent 
assignment  statements  described  previously  are  a 
shorthand  notation  for  a  collection  of  processes  with 
single  assignments  to  output  signals.  The  process 
equivalent  of  Figure  4  is  shown  in  Figure  6 

The  parallels  between  process  descriptions  and 


architecture  behavior  of  sr_latch  IS 
begin 

pi:  process 

variable  tmp  :  bit; 
begin 

tmp  :=  q; 

q  <=  not  (qb  and  s); 
qb  <=  not  (tmp  and  r); 
end  process; 
end  behavior; 
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architecture  behavior  of  sr^latch  is 
begin 

pi:  process  begin 

q  <=  NOT  (qb  AND  s) ; 
end  process; 
p2:  process  begin 

qb  <=  NOT  (q  AND  r) ; 
end  process; 
end  behavior; 


Figure  6:  A  VHDL  architecture  describing  the  be- : 
havior  of  an  S-R  latch. 


traditional  programming  languages  are  exploited  to 
synthesize  behaviors  for  single  entities.  The  objective 
is  synthesis  of  code  for  process  statements  and/or 
concurrent  assignments.  Problems  are  decomposed 
with  respect  to  output  ports  and  composed  using  the 
concurrent  assignment  or  process  facilities. 

2,1.3  Structural  Specification 

Structural  specification  involves  specifying  a  collec¬ 
tion  of  VHDL  entity  components  and  connections 
between  them.  Using  the  component  statement ,  the 
architecture  specifies  the  components  useMn  the  as¬ 
semblage  and  assigns  local  names  to  ports.  The  body 
of  the  structural  architecture  names  each  local  compo¬ 
nent,  assigns  a  component  from  the  declarative  section 
to  it,  and  specifies  connections  involving  the  local  com¬ 
ponent  using  local  parameter  names.  Figure  7  shows; 
a  structural  specification  of  an  S-R  latch. 


architecture  structure  of :::sii|iliatch  is 
component  nor2 

port  (a,  b  :  in  bit  ;\’c  :  out  bit) ; 
begin  >  ’1||| 

nl:  nor2  port  map  . (s,qb,q) ; 
n2:  nor2  port  map  (r,q,qb); 
end  structure; 


Figure  7:  A  VHDL  architecture^  describing  the 
structure  of  an  S-R  latch. 

Together  the  entity  and  architecture  constructs 
describe  a  components  inputs,  Junctional  behavior 
and  structure.  Many  architectures  may  exist  for:  a  sin¬ 
gle  entity,  .thus  the  configuration  structure  is  used 
to  specify  jWliat  architecture  should  be: associated  with 
each  entity;  A  typical  VHDL  design 'process  involves 
specifying  component  interfaces,  specifying  behavior 
and  refining  the  behavior  to  specify  an  implementa¬ 
tion  ^|a;:;sti:uctural  specification.  / 

Given  a  behavioral  description,  there  are  auto¬ 
mated  and  serai-automated  means  of  refining  that  de¬ 
scription.  It  is  currently :  possible  to  synthesize  di¬ 
rectly  implement  able  designs  :from  behavioral  VHDL 


as  large  as  small  CPUs  [111.  Many  commercial  VHDL 
support :  environments  include  synthesis  subsystems. 
Thus,  prototype  system  synthesis  is  achieved  by  gener- 
atingbehavioral  VHDL  and  using  lower  level  synthesis 
tools. to  generate  code-  ASICs,  and  board  layouts. 

2.2. |  VSPEC  Entities 

VSPEC  adds  six  declarative  clauses  to  the  VHDL 
entity:  (1)  the  state  clause  declares  variables  repre¬ 
senting  the  state  of  the  component:;  (2)  the  requires2 
states  the  component’s  precondition  and  is  a  function 
mapping  entity  input  and  state  variables  onto  the 
boolean  set;  (3)  the  ensures  clause  states  the  compo¬ 
nent’s  postcondition  and  is  a  function  mapping  input, 
output  ancf; state  variables  onto  the  boolean  set;  (4) 
the  modifies  clause  names  input,  output  and  state 
variables: -'whose  values  may  be  changed  by  the  com¬ 
ponent;;;  (5)  the  constrained  by  clause  states  perfor¬ 
mance  constraints  associated  with  the  component;  and 
(6)  the:  based  :;on  clause  associates  primitive  and  user 
defined  types;  with. shared  language  representations. 

Each  clause' is  Stated  as  a  logical  expression  (with 
the  exception  of  thelmodif  ies  and  based  on  clauses) 
in  typed  first  order  predicate  calculus  with  equality, 
extended  to  include  set  and  sequence  theories.  The 
only  variables  allowable  in  the  logical  expressions  are 
defined  in  the  entity’s  port  definition,  the  VSPEC 
::staieflause,  or  defined  locally  in  a  logical  expression 
as  a  quantified  variable.  All  variables  must  be  typed 
Jpd  typing  requirements  are  checked  by  the  VSPEC 
Ijparser. 

The  VSPEC  parser  transforms  each  clause  into 
a  Refine  logical  expression  used  to  drive  synthesis 
and  analysis  algorithms.  Figure  8a  shows  a  generic 
IPPEC  entity  definition  with  each  VSPEC  clause. 

2|3  :  Representation  of  Architectures 

||HDL  represents  connected  collections  of  compo¬ 
nents  using  architectures  as  shown  in  Figure  7.  Com¬ 
ponents  are  connected  and  their  parameters  used  to 
•Indicate  interconnection.  The  example  shown  in  Fig- 
lure  7  defines  a  two  stage,  batch  sequential  approach  to 
searching  a  collection  of  values.  The  input  list  is  sorted 
and  a  binary  search  is  applied  to  the  result  of  sort¬ 
ing.  The  architecture  represents  the  batch  sequen¬ 
tial  approach  by  defining  a  sorting  component,  defin¬ 
ing  a  binary  searching  component,  and  connecting  the 
outputs  of  the  sorter  to  the  inputs  of  the  searcher. 
Note  however  that  the  search  and  sort  component’s 
implementation  details  are  not  specified.  Specific  al¬ 
gorithms  must  be  synthesized  at  some  later  point. 

Thus,  the  architecture  notion  is  used  in  conjunc¬ 
tion  with  VHDL  entity  components  with  VSPEC  an¬ 
notation  to  represent  system  architectures.  Represen¬ 
tation  of  requirements  for  multi- component  systems 
also  allows  VSPEC  to  represent  composite,  multi- 
component  systems  by  supporting  specification  of 
hardware  executing  software  processes  and  complex 
device  intercommunication. 


2In  earlier  versions  of  VSPEC  and  earlier  papers,  the 
requires  clause  case  called  the  assumes  clause 
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entity  example  is 

port  (a,b:  in  bit;  c:  out  bit); 
modifies  c; 
state  s:  bit; 
requires  I(a,b); 

ensures  0(a,b,s,c)  and  DCa^SjS*); 
constrained  by  Q; 
end  example; 


D  =  bit;*  bit  x  bit 
R  =  bit*  bit 
I  =  ;:R  (  a ,  b ,  s ) 

0  =;  0(a,b,s,c)  A  D(a,b,s , s } ) 

C  *  Q  : 


a)  .JF  b) 

Figure  8:  VSPEC  definition  and  associated  tuple  representation. 


entity  sort  is 

|::J;port  (input:  in  array  of  integer; 

output:  out  array  of  integer); 
modifies  output; 

ensure  sfbag(input )  =  bag  (output)  A 
sorted(output 

end  sort; 

entity  bin.search  is 
port  (input:  buffer  array  of  integer; 
key:  in  integer; 
value:  out  integer); 
modifies  out; 
requires  $orted(input) ; 

V  x  :  integer  value  =  x  =>  x  £  input ; 
end  bin_search; 


entity  search  is 

port  (input:  buffer  array  of  integer; 
key:  in  integer; 
value:  out  integer); 
modifies  value; 
ensures 

V  x  :  integer  value  =  x  ;=>  x  E  input ; 
end  exampl e ;  . .  . 

architecture  bat*-seq|p§  search  is; 
component  sorter  .Im¬ 
port  (input:  in-v- array  of  integer; 

output :  put;  array  of  integer); 
component  bin. search  :: 

port  (input:-  in  array  of  integer; 
key:  in  integer; 
value:  out  integer); 

begin 

bl:  cl  port  map(x,y); 
b2:  c2  port  map{y»z)  Jv-I 
end; 


Figure  9 y fling  a  VHDL  architecture  to  represent  general  structures.  Note  that  a  VSPEC  entity  is  used  to 
represent ;#he  requirements  of  each  component 
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3  Parsing  VSPEC 

Algorithm  synthesis  does  not  operate  on  raw 
VSPEC.  Before  algorithm  synthesis  begins,  the 
VSPEC  definition  is  parsed  into  a  decorated  abstract 
syntax  tree.  The  abstract  syntax  tree  is  represented 
using  the  REFINE  object-base  and  the  parser  written 
in  the  Dialect  system. 

In  the  abstract  syntax  tree,  each  entity  is  repre¬ 
sented  by  its  constituent  components  from  the  inter¬ 
face  language.  Namely,  the  port  and  state  clauses 
representing  the  system  interface  and  internal  state, 
the  input  precondition,  the  output  post  condition,  and  J| 
any  existing  performance  constraints.  Together,  these  || 
define  a  specification  as  a  problem  theory  [13]  support¬ 
ing  use  of  KIDS  and  other  similar  transform  systems.# 

The  abstract  syntax  tree 

also  represents  architectures.  Each  architecture  fe 
is  linked  to  the  entity  representing  its  interface  add  p| 
requirements.  Refining  our  synthesis  objective  leads  ^  i 
to  the  goal  of  associating  each  entity  with  at  least 
one  architecture  with  a  behavior  description,  or  a 
structural  description  whose  components  have  com¬ 
plete  behavioral  descriptions  at  some  level  of  abstrac¬ 
tion.  Figure  8b  shows  the  result  of  parsing  a  VSPEC 
entity. 

4  VHDL  Synthesis  From  VSPEC 

After  VSPEC  is  parsed  into  the  abstract  syntax  ill 
tree  form,  synthesis  activities  begin.  For  each  entity, 
a  suitable  architecture  must  be  synthesized#  From 
the  port  descriptions  and  VSPEC  clauses,  a  domain 
theory  is  formed  and  represented  using  the  DRIO  no¬ 
tation  proposed  by  Smith  [12, 13,  14].  The  user  guides  ..-V 
the  selection  of  a  general  structure  involving  either  a 
single,  behavioral  architecture,  or  structural  architec-  :#| 
ture  specifying  a  configuration  of  components. 

4.1  Synthesis  Goals 

Although  VHDL  and  Ada  share  structural  simrnfe^ 
ities,  VHDL  should  not  be  viewed  as  simply  a  pro¬ 
gramming  language.  Several  characteristics  of  VHDL 
representations  must  beatcounted  for  in- the  synthesis  : 
process.  These  includephe  co-existence  of  entities, 
the  state  machine  nature  of  entity  descriptions,  signal 
attributes,  and  concurrent  assignment. 

A  naive  examination  of  VHDL  may  lead  to  the 
belief  that  the  entity  component  is  equivalent  to  a 
procedure  or  function  in  a  traditional  imperative  lan# 
guage.  Thus,  connections  between  components  de¬ 
fine  a  sequential  control  flow.  In  a  VHDL  •  descrip¬ 
tion,  entity  components  represent  concurrently  ex¬ 
isting  devices  .and  processes.  Activation  of  compo¬ 
nents  occurs  due  to  parameter  changes,  not  due  to 
explicit  call|pia&d  parameter  passing.  Each  entity 
description |h as  a  sensitivity  list  indicating  what  pa¬ 
rameter  changes  can  cause  its  invocation.  When  in¬ 
voked,  the  architecture  implementing  the  entity  is 
executed  to  completion,  interacting  with  other  entity 
structfi|es:  only  through  changing  port  values  and  wait 
statements; -iAlthough  an  entity  is  thelfasic  comput¬ 
ing  element  in  VHDL  as  a  procedure  is  in  Ada,  an 
entity’s  behavior  more  closely  represents  a  process 
than  a  procedure. 


The  general  goal  of  a  VHDL  synthesis  activity 
driven  by  a  VSPEC  specification  is  to  synthesize  a 
function,  F(x)y  such  that: 

Vr  :  D  •  I(x)  =>  3 2  ;  R  ♦  0{x)  z)  A  E{x)  —  z  (2) 
where :  •' i:  •;  I#  •  :>  4 :  ?  •  ■ 

;  -  D  is  the  cartesian  product  of  sorts  associated  with 
in,  inout  and  buffer  parameters. 

-  R  isifhe  cartesian  product  of  sorts  associated  with 
out,  incut  parameters  and  only  those  buffer  pa¬ 
rameters  named  in  the  modifies  clause. 

-  /(x);ps  the  input  precondition  defined  in  the 
requires  clause. 

-4;-  x,  z)  is  the  output  postcondition  defined  in  the 
ensures  clause. 

4.2  State  Based  Solutions 

VHDL  reflects  the  common  view  of  hardware  com¬ 
ponents  as  state  machines.  Unlike  a  typical  subpro¬ 
gram,  an  entity’s:' local  storage  is  not  initialized  for 
each  invocation  -  local  variables  and  some  parameters 
...maintain;  their  previous  values.  Thus,  values  of  lo¬ 
cally  defined  signals  and  variables,  and  ports  define 
the  state  of  the  component.  In  Figure  4,  the  previ¬ 
ous  values  ofiq  and  qb  are  used  to  generate  the  next 
Values. 

The  structure  of  a  traditional  axiomatic  specifica¬ 
tion  from  Section  1.2  is  defined  over  the  inputs  and 
outputs  of  the  specified  component.  No  mention  is 
made  of  the  internal  state  of  the  component.  The 
b^ute  force  approach  would  be  altering  the  entity 
definition  to  include  state  variables  as  inputs. 

State-based  system  synthesis  is  achieved  by  synthe¬ 
sizing  a  transform  that  includes  anything  maintaining 
its  state  from  one  invocation  to  the  next  as  a  part  of 
,:bpth  the  domain  and  range  of  the  transform.  Con¬ 
sider  the  VSPEC  example  from  Figure  8.  To  satisfy 
this  VSPEC  specification,  we  must  synthesize  one  or 
more  transforms  that  collectively  satisfy: 

D  =  bitxbitxbit 
R  =  bitxbit 
I(x:D)  =  R(a,b,s) 

0(x:D,z:R)  =  Q(<a,b,s>,x)  A  D(<a,b,s>,s) 

The  domain  and  range  of  the  entity  being  syn¬ 
thesized  are  different  than  the  domain  and  range  of 
the  synthesis  goal.  The  entity  domain  and  range  are 
both  augmented  to  include  types  of  state  variables. 
Thus,  the  synthesized  function  will  produce  values  for 
entity  output  signals  and  signals  and  variables  main¬ 
taining  state.  The  state  variables  of  a  VSPEC  entity 
include  and  variables  defined  in  the  state  clause  and 
ports  defined  as  type  buffer,  out,  or  inout.  The  syn¬ 
thesis  goal  stated  earlier  is  modified  such  that  the  do¬ 
main  and  range  include  values  of  state  variables.  Ap¬ 
propriate  buffer  signals  have  already  been  included 
in  the  original  domain. 
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Note  the  single  function  produces  state  transition 
and  output.  Thus,  the  resulting  system  is  either  a 
Moore  or  Mealy  type  machine  depending  on  the  sig¬ 
nals  and  variables  involved  in  calculating  outputs. 

4.3  Managing  Concurrency 

Behavioral  VHDL  heavily  utilizes  concurrent  signal 
assignment  and  concurrent  processes.  Viewed  as  a  se¬ 
quential  program,  the  specification  from  Figure  4  is 
incorrect.  The  contents  of  q  would  be  updated  before 
qb  replacing  the  previous  q  needed  for  qb.  This  is  an 
example  of  the  classic  value  swap  problem  in  tradi¬ 
tional  programming  languages.  In  behavioral  VHDL, 
these  assignments  occur  concurrently.  When  assign¬ 
ments  occur  concurrently,  the  specification  will  func¬ 
tion  correctly. 

Concurrent  assignments  and  concurrent  processes 
begin  from  the  same  state  and  cause  state  changes: 
at  the  same  time.  Thus,  any  problem  may  be  de¬ 
composed  into  processes  that  generate  outputs  for  a 
subset  of  output  signals.  If  no  output  signal  is  driven 
by  multiple  assignments  and  processes  do  not  inter¬ 
act  via  wait  statements  or  shared  local  variables,  the 
composition  of  those  processes  is  trivial.  Full  advan¬ 
tage  of  the  independence  of  processes  and  assignments 
is  taken  when  partitioning  problems. 

4.4  Partitioning 

The  brute  force  synthesis  approach  is  to  generate 
an  algorithm  that  accepts  an  element  from  D  ahd  gen¬ 
erates  an  appropriate  element  of  R .  This  function  is 
translated  into  a  single  VHDL  process  that  executes 
and  updates  all  output  signals.  Figure  10  illustrates 
such  a  transform. 

A  more  appropriate  synthesis  method  takes  advan¬ 
tage  of  the  VHDL  process  and  concurrent  signal  as¬ 
signment  concepts.  The  requirements  of  each  func¬ 
tion  being  synthesized  is  decomposed  into  require¬ 
ments  for  each  signal,  or  requirements  for  disjoint  sub¬ 
sets  of  signals  defined  in  the  function’s:  range.  When 
evaluating  concurrent  signal  assignments,  each  signal 
driver  is  evaluated  independently  from  the  same  ini¬ 
tial  state  and  results  are  concurrently  assigned  to  out¬ 
puts.  Thus,  the  evaluation  of  each  driver  has  ho  effect 
on  other  drivers. 

When  synthesizing  functibhs;i;fo|:dj:iyers ,  full  advan¬ 
tage  is  taken  of  this  independence; ;  The  synthesis  oblif 
gation  is  decomposed  into  several  simpler  obligations 
for  subsets  of  the  output  signals.  These  functions  are 
composed  as  procgssesiilnihn: architecture.  The  compo¬ 
sition  will  be  correct  if:  (a)  each  output  signal  appears 
on  the  left  side  of  an  assignment  in  only  one  driver  ;  (b) 
the  conjunction  of  postconditions  from  each  driver  sat¬ 
isfies  the  opfall  postcondition;  arid  satisfying  the 
input  coipition  implies  the  input  condition  of  each 
driver  ispatisfied. 

Forjnally,  synthesizing  algorithms  tpr  collections 
of  si|I||i!|iiyplves  generating  the  set- of  functions 
,  Rk,  Ik(x),  0*(ag§)  and  Ck  de¬ 
scribe  the  domain,  range,  input  condition,  output  con¬ 
dition  and  constraints  of  fk{x).  The  following  two 
conditions  must  also  hold: 


n 

I(x)  =>  f\  Ik{xk)  (3) 

k-i 

f\  &k  k ,  Zh)  ^;Q{x  > (4) 

M It.  fc  =  l  : 

Equation  3  assures  that  if  the  overall  input  precon¬ 
dition  is  met,  individual  driver  preconditions  are  also 
^x’niet.  If  this  were  not  the  case,  then  it  would  be  pos¬ 
sible  for  a  function  to  fail  when  the  precondition  of 
the  overall:  entity  is  met.  Equation  4  assures  that  if 
each  driver  postcondition  is  satisfied,  the  overall  out¬ 
put  condition  is  satisfied.  If  this  were  not  the  case, 
then  the  collection  of  synthesized  functions  will  not 
necessarily  generate  all  necessary  output  values. 

If  the  mapping  from  each  output  or  state  variable  to 
the  function  that  generates  it  is  injective,  then  a  driver 
is  synthesized  for  each  output.  Thus,  concurrent  sig¬ 
nal  assignriients  are  used  to  assemble  the  drivers  into 
a  single  component.  Otherwise,  a  process  and  nec¬ 
essary  local  storage  are  created  for  each  driver.  Both 
options  are  shown  in  Figure  11. 

It  should  be  noted  that  each  of  the  drivers  syn¬ 
thesized  functions  independently.  From  a  synthesis 
perspective,  this  eliminates  the  need  to  verify  that  no 
harmful  interactions  occur  between  drivers.  However, 
a  system  is  rarely  developed  as  a  collection  of  inde¬ 
pendent  components.  To  synthesize  realistic  VHDL 
.^systems,  multi- component  systems  with  realistic  de- 
:  grees  of  interaction  must  be  synthesized. 
v  4.5  Signal  Attributes 

Software  systems  deal  primarily  with  stable  van¬ 
'll  able  values.  Hardware  representation  systems  must 
represent  not  only  signal  values,  but  how  those  values 
change.  Consider  a  device  with  a  leading-edge  driven 
clo|£.  If  the  clock  is  viewed  as  a  binary  value,  only  two 
states  can  be  represented.  VHDL  provides  function 
attributes  of  the  form  sym  ’ait  where  sym  is  a  defined 
symbol  name  and  att  is  an  attribute  defined  for  that 
symbol.  Attributes  such  as  delayed,  stable,  quiet, 
and  transaction  are  defined  for  all  symbols  repre¬ 
senting  signals.  For  example,  the  event  attribute  re¬ 
turns  a  true  value  if  its  associated  signal  just  changed 
values.  Thus,  the  following  VHDL  statement  repre¬ 
sents  the  conditional  for  an  event  that  should  occur 
on  the  rising  edge  of  signal  elk: 

clk^l*  and  elk* event 

Managing  signal  attributes  appears  to  be  a  diffi¬ 
cult  problem.  However,  defining  predicates  and  the¬ 
ories  for  needed  attributes  supports  their  inclusion  in 
the  specification  process.  The  elk1  event  attribute 
reference  can  easily  be  represented  as  the  predicate 
event  (elk).  Furthermore,  adjusting  the  syntax  of  the 
interface  language  allows  specification  of  the  attribute 
using  the  VHDL  form.  Figure  12  shows  an  adaptation 
of  the  SR  latch  specification  to  include  a  clock  signal 
and  form  an  edge  triggered  flip-flop. 

Although  some  VHDL  characteristics  may  not  feel 
natural  to  a  traditional  programmer,  they  are  quite 
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entity  sr_latch  is 
port  (s,r  :  in  bit, 

q,qb:  buffer  bit) 
ensures 

q*  =  ~(s  and  qb)  and 
qb*  =  “(r  and  q) 
end  sr_latch; 

function  sr_latch(s,r,q,qb:  bit) 
:  tuple (bit , bit) 

<not(s  and  qb) ,  not(r  and  q)> 


architectiure  mono  of  sr_latch  is 

procedure  sr_char(s,r,q,qb:  in  bit,; 

nq ,  nqb :  out .  b  i t ):  i  s 
begin  y' 

nq  not(qb  and  s); 
nqb  :=  not(q  and  r) ; 
end  sr_ char; 

begin 

'•  bl :  process.  ..,;.;.. 

variable  nq,  nqb:  bit; 
begin|§F 

sribbar ( s , r , q , qb , nq , nqb ) ; 
q  <=  nq; 

:  qb  <=  nqb; 
r  end  proc  es  s ; 
end  mono; 


Figure  10:  Monolithic  algorithm  synthesis  for  a  single  V^DL  entity. 


architecture  proc  of  test  is 
bi:  process 

variable  tmpi  ,tmp2. .  .tmpk 
fi  ( xi  ytmpi  >tmp2> . . ,tmpk ) ; 
zi  <=  tmpi; 

Z2  <=  tmp2; 


Ri 


Zk  <=  tmpk 

end  process;  ■  g|§|§; •  8: ||| 

b2 :  pro  cess 

variable  ,tmpk+2-  -  $mpj  :  R2 

f2(x2>tmpk+1,impk+2-  •  ‘,-trnpj) ;  : 

Zk+ 1  <=  tmpk+i0i;..: 

%k+2  <=  tmpk+zi-M 


Zj  <=  tmpj  ,f 
end  process; 

bm:  process 

variable  tmpn^i,tmpn  :  Rm 

fm  tern  ‘ :'-'; 

zn—i  <=  impri- 1;  -  . 

%n  <=  ffripn  J 

end  proi&ess;  xfl||||l:: 

end  proc;:  1§|| 


architecture  concur  of  test  is 
•*l  <=  /iUi); 

||p2  <=  2); 


^n:  fn  (%n  )  > 


end  CpnQifr ; 


Figure  11;  Assembling  multiple  algorithms  into  a  single  component  using  processes  and  concurrent  signal  assign¬ 
ment.  11 II  ..  ®' 
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entity  re_sr_latch  is 
port  (s,r  :  in  bit, 

q,qb:  buffer  bit, 
elk:  in  bit) 
ensures 

(clk=l  and  event (elk)  and 
q*  =  “(s  and  qb)  and 
qb>  =  ~(r  and  q)) 
or 

(q’=q  and  qb}=qb) 
end  re_srJLatch; 


Figure  12:  SR  latch  specification  modified  to  specify: 
a  rising  edge  triggered  SR  flip-flop. 


natural  to  hardware  designers.  In  addition,  each  can 
be  specified  using  traditional,  axiomatic  specification 
techniques.  However,  any  successful  VSPEC-related 
tool  must  be  used  by  hardware  designers.  Thus, 
specification  of  this  type  of  characteristic  in  VSPEC 
and  synthesis  of  VHDL  supporting  these  characteris¬ 
tics  must  be  supported  by  VSPEC.  This  is  the  pri¬ 
mary  reason  for  using  a  Larch  interface  language 
the  designer  works  in  a  language  supporting  tradi¬ 
tional  hardware  specification  techniques  and  familiar 
constructs  that  have  a  formal  interpretation: 

5  Synthesis  Techniques 

Given  the  result  of  parsing  a  VSPEC  entity,  brie 
may  employ  any  number  of  synthesis  techniques  to 
derive  an  algorithm  for  the  transform.  In  this  work, 
formal  synthesis  techniques  are  employed.  Specifically, 
a  case-based  reasoning  approach  based  on  the  CY¬ 
PRESS  [12]  operatormiatch  problem  solving  technique 
and  interfacing  with  kids  [14] .  ? 

5 . 1  Algorithm  Synthesis 

5.1.1  Direct  Transforihation 

The  simplest  algorithni;  specification  technique  avail¬ 
able  is  direct  transformation.  It  should  be  used  when 


ification  is  transformed  using  simple  syntactic  tech¬ 
niques  to  generate  VHDL  code, :  Thfe;;  technique  is 
similar  to  the  specification  to  code  option  available 
in  KIDS  and  other  automatic  programming  system^:' 

5.1.2  Case^Base&::R^ 

Case-based  ^.reasoning  [10]  uses  similarities  between 
problems  fo  select  solutions.  The'  assumption  is 
that  similarity  between  problems  implies  similarity 
between  solutions.  In  the  BENTON  system,  case- 
based  ri|soning  is  used  to  retrieve  and  f feuse  specifica¬ 
tions  j^J|;i|fhe:  same  techniques  are  used  to  retrieve  and 
reuse  VHI)Lfratgments  described  by  VSjf EC  specifica¬ 
tion^:  VSPEC|>i^;;Usrid  to  generate  features  from  both 
the  problem  aridipotiential  solutions  for  similarity  cal¬ 
culation.  Case-based  reasoning  is  useful  primarily  for 


retrieving  potential  solutions,  but  does  not  guarantee 
validity  of  the  solution.  Thus,  after  the  solution  is  re¬ 
trieved  and  adapted,  a  further  proof  obligation  exists. 
VSPEC  makes  this  obligation  simpler  and  supports 
means  for  correctness  preserving  adaptation  using  de¬ 
rived  antecedent,  but  it  does  not  avoid  the  obligation. 

5.1.3  Formal  Transformation 

The  chief  algorithm  synthesis  tobl  is  KIDS  [14,  13]. 
Kids  is  based  on  the  formal  composition  of  an  algo¬ 
rithm  theory  representing  a  problem  solving  method¬ 
ology  and  a  domain  theory  representing  the  prob¬ 
lem  itself.  The  BRIO  specification  generated  by  the 
VSPEC  parser  is  motivated  chiefly  by  specification 
format  used  by  kids,  however  other  synthesis  systems 
frequently  use  similar  means  for  representing  specifi¬ 
cations ;  [8]  . 

In  :  ' 


birder  to  generate  algorithms  using  KIDS,  one 
must  spbciify-;  a  complete  domain  theory,  of  which  the 
specification  itself  is  only  a  part.  One  must  also  use 
Refine  to  specify  laws  and  auxiliary  functions  that 
define  the  transform  itself.  The  transformation  from 
interface  language:  to  Refine  accomplishes  some  of 
this  along  with  libraries  of  general  theories  describing 
operators  over  types.  In  general,  specifications  beyond 
those  directly  specified  by  VSPEC  are  required  for  the 
synthesis  process  to  complete  effectively. 

5.2  Architecture  Synthesis 

The  most  active  area  of  this  research  is  synthesis  of 
multi-component  systems.  Given  a  high  level  VSPEC 
specification,  generate  a  system  involving  a  collection 
of  interconnected  entity’s  rather  than  simply  a  col¬ 
lection  of  independent  processes. 

5.2.1  Case-Based  Reasoning 

The  simplest  technique  currently  used  is  applying 
case-based  reasoning  to  retrieve  and  adapt  multi- 
:process  architectures.  Architectures  take  the  form  of 
procedural  networks  with  each  action  representing  a 
single  component.  In  a  typical  procedural  network, 
an  action  is  represented  by  a  precondition  and  post¬ 
condition,  thus  the  representation  adapts  naturally  to 
specification  of  some  architectures.  Actions  are  spe¬ 
cialized  using  algorithmic  synthesis  techniques,  an¬ 
tecedent  derivation,  or  heuristic  adaptation.  As  be¬ 
fore,  the  results  of  some  adaptation  processes  require 
that  the  resultant  algorithm  be  verified.  If  the  archi¬ 
tecture  is  known  to  be  correct  and  is  specialized  using 
correctness  preserving  operations,  verification  is  typi¬ 
cally  not  required. 

5.2.2  Formal  Synthesis 

General  architectures  can  be  synthesized  using  the 
kids  approach  by  developing  algorithm  theories 
to  support  architecture  synthesis  and  by  using 
antecedent  derivation  to  discover  missing  compo¬ 
nents  [3].  The  batch  sequential  architecture  for  the 
search  entity  shown  in  Figure  9  can  be  synthesized 


9 


305 


by  selecting  the  binary  search  algorithm  and  using  its 
precondition  to  derive  the  sort  entity. 

The  binary  search  algorithm  takes  a  key  value  and 
a  list  of  elements  and  returns  the  value  discovered  in 
the  list.  The  precondition  of  binary  search  is  that  the 
input  list  must  be  ordered.  Other  preconditions  may 
also  be  derived  to  fit  this  algorithm  to  the  problem. 
There  is  no  precondition  associated  with  the  search 
entity  driving  the  synthesis  process,  thus  there  is  no 
assurance  that  the  collection  of  inputs  will  be  in  or¬ 
der.  Thus,  a  component  must  be  developed  to  prepare 
the  original  input  for  use  by  the  binary  search  routine. 
This  technique  is  very  similar  to  techniques  used  by 
CYPRESS  and  kids  to  synthesize  divide- and-conquer: 
algorithms  [12].  The  specification  of  this  new  compo¬ 
nent  will  be  as  follows: 

-  D  =  D$  M 

-  R—  Db$ 

-  I(x)  =  Is(x) 

-0(x,z)  =  Ibs(z) 

where  DS1R$,...  are  associated  with  the  original 
search  specification  and  Dbs ,  Rbs , . . .  are  associated 
with  the  binary  search  specification.  The  resultant 
specification  is  almost  the  sorting  specification  with 
no  precondition  and  a  sorted  output  condition,  ;  Note 
the  missing  bag(x)=bag(z)  element  in  the  generated 
specification. 

Arbitrarily  complex  sequences  of  entity’s  may  be 
specified  in  this  manner  by:  (a)  repeating  the  ibatch 
sequential  process  for  discovered  components;  and  (b) 
generating  similar  techniques  for  batch  parallel  and 
conditional  branching.  Note  that  a  control  strategy  isv 
not  proposed  here.  The  user  must  make  control  deci¬ 
sions  at  each  synthesis  stage.  Thus,  problems  associ¬ 
ated  with  some  planning  algorithms  can  be  avoided. 

5.2.3  Non-Sequential  Architectures 

The  antecedent  derivation  techniques  can  effectively 
generate  architectures  wh£re  a  clear  order  of  execution ::; 
exists  and  components  do  not  engage  inbidirectiimad; 
communication.3  Con|idgr  specification  of  ;a  pair  of 
transceivers  or  synchrony  devices. 

In  order  to  synthesize  such  system  using  kids  tech¬ 
niques,  general  algorithm  theories  must  be  developed 
describing  various  architectures.  Antecedent  deriva¬ 
tion  is  useful  even  in  these  situations,  but  discover¬ 
ing  missing  components  should  eventually  give  way  to 
specializing  known  architiectures  to  specific  problems. 


Refine  and  is  targeted  towards  formal  VHDL  veri- 
not  synthesis.  The  techniques  used  in  this 
being  extended  from  Penelope  [4],  an  Ada 
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6  Relat  eSJ^V^rk’'’ 

The  approach  taken  in  constructing  the  VSPEC 
language based  heavily  on  the  Lar0h/Modula^3  in¬ 
terface  language  [7].  Another  parali|!i|eff6rt  in  de¬ 
veloping  a  Larch  interface  language  is  underway  at 
Odyssey:;:Rqsearch  Associates  [6].  This. -interface  lan¬ 
guage:;  uSes|t he.  Larch  Shared  Language  rather  than 


verification  system.  VSPEC  could  potentially  support 
prime  motivation  is  driving 
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ects  of  this  work  and  the 


%It  has  not  been  determined  that  antecedent  derivation  can¬ 
not  be  used  for  such:  situations;  It  simply  hats  not  been  demon¬ 
strated  that  it  can. 


verification,  however 
synthesis  processes; 

-:i|bst  of  the  synthesis: 
specification  of  components  in:  terms  of  domain,  range, 
input  precondition  and  output  postcondition  is  based 
:  on  application  of  algorithm  theories  to  program  syn¬ 
thesis.  These  techniques  were  proposed  by  Smith  [131 
and  implemented  in  the  CYPRESS  [12]  and  KIDS  |14" 
systems. 


7  Future  Directions 

Three  directions  currently  dominate  this  research 
effort:  (1)  development  of  kids  algorithm  theories  for 
general  architectures;  (2)  management  of  constraints 
during  the  design  activity;  and  (3)  migration  of  the 
general  technique  away  from  VHDL. 

An  algorithm  theory  represents  a  general  problem 
solving  technique,  jUsing  a  multi-component  architec¬ 
ture  is  one  such  geheral  technique,  however  no  theory 
exists  for  its  application  in  the  current  KIDS  system. 
To  extend  these  techniques  to  larger  systems,  general 
algorithm; theories  must  be  developed.  Proposed  tech¬ 
niques  for  batch  sequential  systems  are  shown  here 
:&iid  similar  techniques  are  proposed  for  batch  parallel. 
^However,  more  complex  architectures  must  be  devel¬ 
oped,  particularly  for  communicating  systems. 

Currently  VSPEC  represents  several  types  of  con¬ 
straints.  At  each  stage  in  the  design  process,  these 
constraints  can  be  checked  in  the  abstract  syntax  tree. 
Thus,  constraint  violations  can  be  detected.  Of  par¬ 
ticular  difficulty  is  management  of  propagation  time. 
Odyssey  Research  Associates  [6]  takes  the  approach 
of  associating  events  with  time  points  in  the  interface 
language.  This  requires  using  a  temporal  logic  in  the 
verification  activity.  VSPEC  uses  an  interval  represen¬ 
tation  to  define  the  time  from  input  signal (s)  arrival 
to  output  signal (s)  generation.  This  separates  timing 
issues  from  the  functional  specification.  Although  tim¬ 
ing  constraints  can  be  verified,  they  must  be  included 
in  the  actual  synthesis  process  eventually. 

Finally,  the  Generic  Abstract  Syntax  Tree  (GAST) 
is  being  developed  to  serve  as  a  general  representation 
for  systems  requirements.  The  objective  is  to  either 
adapt  VSPEC  to  new  source  languages  or  use  existing 
Larch  interface  languages  to  generate  GAST  require¬ 
ment  representations.  A  parser  is  written  to  generate 
GAST  from  each  language  and  synthesis  (and  poten¬ 
tially  analysis)  tasks  performed  on  the  GAST  repre¬ 
sentation.  The  resulting  algorithms  plus  the  GAST 
representation  are  transformed  into  the  output  lan¬ 
guage  of  choice.  Note  that  both  the  input  parsing 
and  output  transformation  are  purely  syntactic  activ¬ 
ities,  thus  existing  technologies  can  be  used  to  con¬ 
struct  these  components.  Using  taking  this  approach, 
VSPEC  techniques  may  be  more  generally  inserted  in 
the  systems  development  and  prototyping  process. 
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